Skip to content

What sift does

sift has one job: make the augmentation layer between a search backend and an AI agent explicit.

A raw SERP is 10 URLs + titles + descriptions. From an agent’s perspective that’s undifferentiated text — the agent has no built-in signal for whether atlassian.com/blog should be weighed differently from nature.com/articles. sift inserts a classification step between the backend and the agent, and the classification is fully visible in both source and output.

The design is deliberately shaped around specific, recurring failure modes of agent-driven search — vocabulary mismatch, vendor-dominated SERPs, affiliate contamination, parasite SEO, TLD-anchored false authority. Each sift feature maps to one of these. See Agent search failure modes for the full mapping.

Every result carries:

  • tier — one of 9 categorical labels (regulated_primary, peer_reviewed, independent_editorial, vendor_primary, vendor_content_marketing, affiliate, content_farm, ugc, unknown)
  • editorial_standards, commercial_intent, self_promoting, third_party, domain_content_mismatch — orthogonal axes
  • authoritative_weight — 0..1 scalar derived from tier + editorial standards + confidence
  • confidence — 0..1, the classifier’s own certainty
  • reason — under 80 characters explaining the tier
  • signals[] — normalized contribution log (which inputs drove the decision)

See Quality vector.

Per-query landscape metrics:

  • tier_distribution — counts per tier over the full SERP
  • mean_authoritative_weight — the SERP’s overall trust level
  • diversity_entropy — Shannon entropy of tier distribution
  • vendor_dominance_ratio — fraction of vendor-published results

See Aggregate vector.

Deterministic meta-observations the agent must incorporate when synthesizing. They fire on specific structural conditions — heavy vendor dominance, parasite-SEO mismatch, structurally commercial SERPs (no non-commercial source to triangulate against), etc.

See Summary hints.

  • Rank or re-order. Brave’s original ranking is preserved.
  • Fetch page content. Classification uses SERP metadata only (URL, title, description).
  • Filter hard by default. recommended_action (keep / tag / block) is advisory — the agent decides.
  • Replace Google Safe Browsing. safety_flag is a parallel axis, not a tier. Classification and safety are orthogonal.
  • In-process fine-tuning. Prompt refinement is an offline process, informed by the learning-loop observation log.
  • Pre-filter below a threshold. Opacity would defeat the entire design. recommended_action is made visible precisely so it can be overridden.
  • Generate sources that aren’t in the SERP. sift is diagnostic, not generative. If a SERP is 100% vendor content, sift reports that — it cannot invent peer-reviewed work that wasn’t indexed.