Quality vector
Every result in a sift response carries a quality_vector. It is the core transparent artifact: each field is derived from a specific input and logged in signals[] so downstream agents (and you) can trace why a classification landed where it did.
Schema
Section titled “Schema”interface QualityVector { schema_version: "1.0", tier: | "regulated_primary" // SEC / EDGAR / gov / court records | "peer_reviewed" // arXiv / PubMed / .edu papers | "independent_editorial" // BBC / Reuters / Ars Technica (sans sponsored) | "vendor_primary" // vendor's own product / docs / homepage | "vendor_content_marketing" // vendor blog / strategic advice | "affiliate" // "Best X" listicle + commercial intent | "content_farm" // templated / AI / low-effort aggregator | "ugc" // Reddit / forums / community | "unknown", editorial_standards: "high" | "medium" | "low" | "unknown", self_promoting: boolean, third_party: boolean, commercial_intent: "high" | "medium" | "low" | "none", domain_content_mismatch: boolean, // parasite-SEO flag authoritative_weight: number, // 0..1 confidence: number, // 0..1 reason: string, // under 80 chars signals: Array<{ origin: "safety" | "authoritative" | "llm", type: string, match: string, weight: number }>}The 9 tiers
Section titled “The 9 tiers”See Tier definitions for the full set of examples and boundary rules. Briefly:
| Tier | Example | Typical use by agent |
|---|---|---|
regulated_primary | SEC filings, government publications, court records | Cite as authoritative |
peer_reviewed | arXiv, PubMed, SAGE/Elsevier journal articles | Cite as authoritative |
independent_editorial | BBC, Reuters, Ars Technica (main articles, not sponsored) | Cite with attribution |
vendor_primary | Product homepage, API docs, official changelog | Cite for vendor facts |
vendor_content_marketing | HubSpot blog, Stripe blog, VC firm thought leadership | Treat as positioning |
affiliate | ”Best X 2026” listicles | Commercial — do not use for objective comparison |
content_farm | Templated AI / SEO churn | Do not cite |
ugc | Reddit, Stack Overflow, Quora | Treat as opinion |
unknown | Cannot determine from signals | Flag uncertainty |
Orthogonal axes
Section titled “Orthogonal axes”tier is the primary categorical dimension, but several orthogonal flags give the agent finer-grained context:
editorial_standards—high/medium/low/unknown. A reputable domain’s/sponsored/path hasmediumorlowstandards even if the parent brand is high.self_promoting— true when the publisher has a direct stake in recommending itself (e.g., a hosting company’s “Best Web Hosting” list).third_party— true when the publisher is independent of the subject.commercial_intent—high/medium/low/none. Orthogonal to tier; a vendor_primary page can benone(docs) orhigh(pricing).domain_content_mismatch— true when the domain’s implied business strongly differs from the content topic. A parasite-SEO flag.
authoritative_weight and confidence
Section titled “authoritative_weight and confidence”authoritative_weightis a scalar in 0..1 derived fromtier+editorial_standards+confidence+domain_content_mismatch. It’s what you’d multiply source claims by when aggregating. The exact formula lives insrc/vectorize.ts#authoritativeWeightFromLlm.confidenceis the classifier’s own certainty about the whole classification. Low confidence + unusual signals is a good trigger for an agent to caveat the result.
signals[]
Section titled “signals[]”Every classification is backed by at least one signal. Three origins:
safety— Google Safe Browsing (threat type asmatch)authoritative— known-trusted / known-good cache hit (domain asmatch)llm— LLM judge output (tier + reason asmatch)
A result can have multiple signals. For example, a reddit.com URL that Google Safe Browsing also flagged would carry both authoritative (known-good:ugc_community) and safety signals.
Why expose this much?
Section titled “Why expose this much?”Because agents need to explain their reasoning. If an agent writes “According to industry data, the SaaS magic number benchmark is 0.7-1.0”, the user (or a reviewer) should be able to ask “where did ‘industry data’ come from?” — and the agent should be able to say “10 vendor_content_marketing blog posts from VC firms, with mean authoritative_weight 0.23; no peer-reviewed source was in the SERP.” That transparency is what the quality vector enables.