What sift does
sift has one job: make the augmentation layer between a search backend and an AI agent explicit.
A raw SERP is 10 URLs + titles + descriptions. From an agent’s perspective that’s undifferentiated text — the agent has no built-in signal for whether atlassian.com/blog should be weighed differently from nature.com/articles. sift inserts a classification step between the backend and the agent, and the classification is fully visible in both source and output.
The design is deliberately shaped around specific, recurring failure modes of agent-driven search — vocabulary mismatch, vendor-dominated SERPs, affiliate contamination, parasite SEO, TLD-anchored false authority. Each sift feature maps to one of these. See Agent search failure modes for the full mapping.
The three outputs
Section titled “The three outputs”1. Per-result quality_vector
Section titled “1. Per-result quality_vector”Every result carries:
tier— one of 9 categorical labels (regulated_primary,peer_reviewed,independent_editorial,vendor_primary,vendor_content_marketing,affiliate,content_farm,ugc,unknown)editorial_standards,commercial_intent,self_promoting,third_party,domain_content_mismatch— orthogonal axesauthoritative_weight— 0..1 scalar derived from tier + editorial standards + confidenceconfidence— 0..1, the classifier’s own certaintyreason— under 80 characters explaining the tiersignals[]— normalized contribution log (which inputs drove the decision)
See Quality vector.
2. SERP-level aggregate_vector
Section titled “2. SERP-level aggregate_vector”Per-query landscape metrics:
tier_distribution— counts per tier over the full SERPmean_authoritative_weight— the SERP’s overall trust leveldiversity_entropy— Shannon entropy of tier distributionvendor_dominance_ratio— fraction of vendor-published results
See Aggregate vector.
3. summary_hints[]
Section titled “3. summary_hints[]”Deterministic meta-observations the agent must incorporate when synthesizing. They fire on specific structural conditions — heavy vendor dominance, parasite-SEO mismatch, structurally commercial SERPs (no non-commercial source to triangulate against), etc.
See Summary hints.
What sift does not do
Section titled “What sift does not do”- Rank or re-order. Brave’s original ranking is preserved.
- Fetch page content. Classification uses SERP metadata only (URL, title, description).
- Filter hard by default.
recommended_action(keep/tag/block) is advisory — the agent decides. - Replace Google Safe Browsing.
safety_flagis a parallel axis, not a tier. Classification and safety are orthogonal.
Design non-goals
Section titled “Design non-goals”- In-process fine-tuning. Prompt refinement is an offline process, informed by the learning-loop observation log.
- Pre-filter below a threshold. Opacity would defeat the entire design.
recommended_actionis made visible precisely so it can be overridden. - Generate sources that aren’t in the SERP. sift is diagnostic, not generative. If a SERP is 100% vendor content, sift reports that — it cannot invent peer-reviewed work that wasn’t indexed.