Using sift with an agent
sift is a vectorizer, not a decision engine. search_vectorized returns a factual description of the SERP landscape — tier per result, aggregate metrics, meta-observations — and stops there. Every judgment (what to cite, when to re-query, how to frame a claim to the user) belongs to the downstream agent.
This guide is the contract between sift and the agent that consumes its output. It is shaped by usage patterns observed in real MCP deployments and the local data/observations.jsonl log. Breaking changes to tier meanings or decision rules are called out in the CHANGELOG.
A copy of this contract also lives at AGENTS.md in the repo for easy embedding in your agent’s system prompt or client CLAUDE.md.
Reading the landscape
Section titled “Reading the landscape”Diagnose the query shape first. Look at aggregate_vector.tier_distribution before individual results.
| Landscape | Signal | Action |
|---|---|---|
| Pitch | vendor_primary + vendor_content_marketing + affiliate ≥ 50% | If user intent is pain / review / opinion, propose a re-query before citing |
| Pain / opinion | ugc ≥ 50% | Treat as primary qualitative source |
| Triangulatable | Any peer_reviewed or regulated_primary present | Ground claims against these |
| Degraded | unknown dominant | LLM classifier failed; discount confidence |
Interpreting recommended_action
Section titled “Interpreting recommended_action”recommended_action is advisory. The agent owns the final call.
block— exclude from citation by default. If you quote a blocked result (e.g., to contrast a commercial framing against independent evidence), label the quote with its bias and keep the block reason visible in your reasoning.tag— usable with explicit attribution. “According to Vendor X’s own blog…”keep— normal citation.
Treating summary_hints[] as verbatim contracts
Section titled “Treating summary_hints[] as verbatim contracts”summary_hints encode SERP-level bias that cannot be inferred from any individual result. Incorporate them into user-facing output as meta-notes. Do not drop them for brevity.
Note: this search returned no peer-reviewed or independent-editorial sources, so the claims below reflect commercial positioning rather than independent research.
Re-query vocabulary
Section titled “Re-query vocabulary”When the landscape diagnosis calls for a pivot:
- Pain / frustration:
why people stop using X,X frustrations,X workaround,"we moved from X to" site:reddit.com - Independent review:
X review reddit,X honest review,X alternatives - Migration drivers:
migrating from X,leaving X for Y - Japanese queries: substitute Qiita, Zenn, and はてなブックマーク for Reddit in
site:operators — these carry the equivalent UGC densities on the JP web. Vocabulary:X 使いにくい,X 代替,X 移行,X やめた.
Source ranking for citations
Section titled “Source ranking for citations”Prefer tier diversity over tier depth. Citing three ugc results is weaker than citing one ugc + one independent_editorial + one vendor_primary (with the vendor bias labeled). A valid reading order when mixing:
regulated_primary/peer_reviewed— ground claims here when present.independent_editorial— corroboration and interpretation.ugc— primary qualitative data for pain / user-experience questions; contextual color otherwise.vendor_primary— spec confirmation only; never for evaluative claims.vendor_content_marketing— quote with explicit attribution, never as a standalone claim.affiliate— avoid as a direct source; price/feature lookups only, with the bias labeled.
Full tier definitions: Tier definitions.
Anti-patterns
Section titled “Anti-patterns”- Treating
authoritative_weightas a generic quality score. The axis is pinned to academic / regulated truth. UGC scores 0.19–0.30 by construction, but for pain-discovery questions UGC is high-value. Lettierdrive interpretation, not the mean. - Silently bypassing
block. If you cite a blocked result, the user has no way to audit that decision. Either respect the block or surface the reason. - Suppressing
summary_hintsfor brevity. They encode what cannot be inferred from individual results — SERP-level bias. - Quoting
vendor_content_marketingas a primary source. The vendor’s framing of its own category is always marketing, even when well-written. - Reading
vendor_dominance_ratioalone. A high ratio is a diagnostic, not a verdict. A vendor-dominated SERP for an inherently commercial query (“buy X”) is expected.
Use-case cheat sheet
Section titled “Use-case cheat sheet”Founder / PM user research (pain reverse). If pitch landscape, pivot vocabulary immediately. Read ugc results in full, not just snippets. Rank pain themes by UGC count and community breadth.
Product comparison. Anchor on independent_editorial where available. Use vendor_primary for spec confirmation. Use vendor_content_marketing only with explicit attribution. Skip affiliate for evaluation.
News summarization. Anchor on independent_editorial, cross-check against regulated_primary for factual claims (earnings, regulatory action). vendor_content_marketing is source-of-claim only.
Academic / due diligence. Require peer_reviewed or regulated_primary. If absent, state that the question has no scholarly substrate at search time and avoid speculation dressed as evidence.
Feedback loop
Section titled “Feedback loop”This contract is shaped by observations logged to data/observations.jsonl (see Learning loop). If you have agent-usage findings that should update it — new use cases, consistently misread hints, tier boundaries that produce the wrong downstream behavior — please open an issue on GitHub with the agent-contract label.