Skip to content

Tier definitions

Tier definitions are in src/llm-judge.ts#VECTOR_SYSTEM_PROMPT. If you edit them, the whole classifier changes.

Official regulatory / government publications, court records, central bank releases, standards-body documents.

Examples: SEC filings on sec.gov/cgi-bin/browse-edgar, FDA guidance at fda.gov/regulatory-information/, court records, Eurostat / BLS / e-Stat Japan statistics, ISO / NIST / W3C specs, ico.org.uk GDPR guidance.

Journal articles and working papers with editorial review.

Examples: arXiv, PubMed / PMC, JSTOR, ScienceDirect, SAGE, Taylor & Francis, Frontiers, Nature, Springer, Emerald, INFORMS journal articles. Working papers / preprints hosted on university repositories when the URL path indicates a research paper (/publications/, /papers/, /article/, /doi/, paper-identifier PDFs).

Established publications with named editors, corrections policy, and fact-checking. Applies to their main editorial output — not sponsored / contributor / council paths.

Examples: BBC, Reuters, The Guardian, Ars Technica, The Atlantic, NYT news, Nikkei news, Wikipedia, university news pages (eller.arizona.edu/news), popular explainers on .edu domains that aren’t peer-reviewed papers.

A vendor’s own product homepage, product page, documentation, API reference, or official changelog. The content is about the vendor’s own product on the vendor’s own canonical surface.

Examples: MDN (Mozilla docs), rfc-editor.org, a SaaS company’s pricing page, a product’s API reference, davidjteece.com (author’s own site for his work).

A vendor’s blog or strategic-advice content that teaches concepts adjacent to the vendor’s product. The article is NOT on a dedicated product page; it educates toward the vendor’s domain as a lead-generation mechanism.

Examples: HubSpot blog on marketing, Stripe blog on payments concepts, VC firm thought leadership (a16z, SaaStr, First Round Review), consulting firm “insights” (Bain, Deloitte, KPMG, McKinsey), for-profit university program pages (waldenu.edu/programs/business/resource/...), trade associations publishing “research” about their own industry (see boundary rule 4).

“Best X”, “Top N”, “Best X for YEAR”, head-to-head comparisons, roundups, buying guides — when the content pattern is a commercial listicle. Apply regardless of publisher reputation. PCMag / CNet / TechRadar / Wired “Best X 2026” articles ARE affiliate, even though those publishers ALSO publish independent_editorial content elsewhere. Classify by the specific article, not the brand.

Mass-produced, AI-generated, low-effort templated content. Thin wrappers around other sources, auto-translated content, SEO-churn sites.

User-generated content with low moderation or community-edited.

Examples: Reddit posts, forum threads, Stack Overflow, Quora answers, Hacker News, individual Medium posts by named authors with no vendor affiliation, community-run wikis.

Signals don’t allow confident classification. authoritative_weight is low; recommended_action is tag so the agent knows not to rely.

Boundary rules (six rules enforced by the prompt)

Section titled “Boundary rules (six rules enforced by the prompt)”
  1. Affiliate trumps reputation. “Best X”, “Top N”, comparison listicles → affiliate, even on famous publishers.
  2. Vendor blog ≠ vendor primary. Blogs at /blog/ or /resources/ teaching adjacent concepts → vendor_content_marketing, not vendor_primary.
  3. URL path override. Paths containing /sponsored/, /partner/, /advertorial/, /branded/, /promoted/, /contributor/, /councils/, /community/, /guest-post/, /opinion/guest/ on otherwise-reputable domains: classify by the article alone. Parent reputation does NOT transfer.
  4. Trade associations / lobbies. Groups publishing “research” about their own industry (Corn Refiners on corn syrup, Heartland on climate) → vendor_content_marketing + domain_content_mismatch=true. They are vendors of a policy position.
  5. Domain / content mismatch. True when the domain’s implied business strongly differs from the content topic (hospital-linen supplier publishing weight-loss reviews). False for plausibly related or generic domains.
  6. Academic / government TLD discipline. .edu, .ac.xx, .gov, .europa.eu, .who.int are NOT automatic tier indicators. Classify by the specific page content:
    • Published paper → peer_reviewed
    • University news / press release / popular explainer → independent_editorial
    • Degree program marketing → vendor_content_marketing
    • Official regulator guidance → regulated_primary
  • Gray-market vendors. Gaming boost providers, tool crack sites, and similar legal-gray commercial actors currently fall through as vendor_primary. A dedicated tier or orthogonal legitimacy axis is on the roadmap.
  • Aggregator journalism (e.g., content aggregators with light editorial overlay) sometimes classifies as content_farm when the overlay is light; rules 2 and 3 help but edge cases remain.

If you find a consistent misclassification, it belongs in the observation log — the learning loop will eventually surface it.