Skip to content

Using sift with an agent

sift is a vectorizer, not a decision engine. search_vectorized returns a factual description of the SERP landscape — tier per result, aggregate metrics, meta-observations — and stops there. Every judgment (what to cite, when to re-query, how to frame a claim to the user) belongs to the downstream agent.

This guide is the contract between sift and the agent that consumes its output. It is shaped by usage patterns observed in real MCP deployments and the local data/observations.jsonl log. Breaking changes to tier meanings or decision rules are called out in the CHANGELOG.

A copy of this contract also lives at AGENTS.md in the repo for easy embedding in your agent’s system prompt or client CLAUDE.md.

Diagnose the query shape first. Look at aggregate_vector.tier_distribution before individual results.

LandscapeSignalAction
Pitchvendor_primary + vendor_content_marketing + affiliate ≥ 50%If user intent is pain / review / opinion, propose a re-query before citing
Pain / opinionugc ≥ 50%Treat as primary qualitative source
TriangulatableAny peer_reviewed or regulated_primary presentGround claims against these
Degradedunknown dominantLLM classifier failed; discount confidence

recommended_action is advisory. The agent owns the final call.

  • block — exclude from citation by default. If you quote a blocked result (e.g., to contrast a commercial framing against independent evidence), label the quote with its bias and keep the block reason visible in your reasoning.
  • tag — usable with explicit attribution. “According to Vendor X’s own blog…”
  • keep — normal citation.

Treating summary_hints[] as verbatim contracts

Section titled “Treating summary_hints[] as verbatim contracts”

summary_hints encode SERP-level bias that cannot be inferred from any individual result. Incorporate them into user-facing output as meta-notes. Do not drop them for brevity.

Note: this search returned no peer-reviewed or independent-editorial sources, so the claims below reflect commercial positioning rather than independent research.

When the landscape diagnosis calls for a pivot:

  • Pain / frustration: why people stop using X, X frustrations, X workaround, "we moved from X to" site:reddit.com
  • Independent review: X review reddit, X honest review, X alternatives
  • Migration drivers: migrating from X, leaving X for Y
  • Japanese queries: substitute Qiita, Zenn, and はてなブックマーク for Reddit in site: operators — these carry the equivalent UGC densities on the JP web. Vocabulary: X 使いにくい, X 代替, X 移行, X やめた.

Prefer tier diversity over tier depth. Citing three ugc results is weaker than citing one ugc + one independent_editorial + one vendor_primary (with the vendor bias labeled). A valid reading order when mixing:

  1. regulated_primary / peer_reviewed — ground claims here when present.
  2. independent_editorial — corroboration and interpretation.
  3. ugc — primary qualitative data for pain / user-experience questions; contextual color otherwise.
  4. vendor_primary — spec confirmation only; never for evaluative claims.
  5. vendor_content_marketing — quote with explicit attribution, never as a standalone claim.
  6. affiliate — avoid as a direct source; price/feature lookups only, with the bias labeled.

Full tier definitions: Tier definitions.

  • Treating authoritative_weight as a generic quality score. The axis is pinned to academic / regulated truth. UGC scores 0.19–0.30 by construction, but for pain-discovery questions UGC is high-value. Let tier drive interpretation, not the mean.
  • Silently bypassing block. If you cite a blocked result, the user has no way to audit that decision. Either respect the block or surface the reason.
  • Suppressing summary_hints for brevity. They encode what cannot be inferred from individual results — SERP-level bias.
  • Quoting vendor_content_marketing as a primary source. The vendor’s framing of its own category is always marketing, even when well-written.
  • Reading vendor_dominance_ratio alone. A high ratio is a diagnostic, not a verdict. A vendor-dominated SERP for an inherently commercial query (“buy X”) is expected.

Founder / PM user research (pain reverse). If pitch landscape, pivot vocabulary immediately. Read ugc results in full, not just snippets. Rank pain themes by UGC count and community breadth.

Product comparison. Anchor on independent_editorial where available. Use vendor_primary for spec confirmation. Use vendor_content_marketing only with explicit attribution. Skip affiliate for evaluation.

News summarization. Anchor on independent_editorial, cross-check against regulated_primary for factual claims (earnings, regulatory action). vendor_content_marketing is source-of-claim only.

Academic / due diligence. Require peer_reviewed or regulated_primary. If absent, state that the question has no scholarly substrate at search time and avoid speculation dressed as evidence.

This contract is shaped by observations logged to data/observations.jsonl (see Learning loop). If you have agent-usage findings that should update it — new use cases, consistently misread hints, tier boundaries that produce the wrong downstream behavior — please open an issue on GitHub with the agent-contract label.