Aggregate vector

Per-result classification tells the agent about each source. The aggregate vector tells the agent about the SERP as a whole — whether the landscape is diverse, vendor-dominated, authoritative, or structurally commercial.

Schema

interface AggregateVector {
  tier_distribution: Record<Tier, number>;   // count per tier, full SERP (not the trimmed max_results)
  mean_editorial_standards: "high" | "medium" | "low" | "unknown";
  mean_authoritative_weight: number;         // 0..1, arithmetic mean over results
  diversity_entropy: number;                 // Shannon entropy of tier_distribution
  vendor_dominance_ratio: number;            // (vendor_primary + vendor_content_marketing) / total
}

The four metrics in practice

`tier_distribution`

Raw counts per tier. The agent can inspect this directly — “3 peer_reviewed, 7 vendor_content_marketing” communicates more than a summary score can.

`mean_authoritative_weight`

The SERP’s overall trust level as a single scalar. Empirical observations across the 5-layer probe suite:

Layer	Query style	mean_auth
A (regulated)	`GDPR Article 17 right to erasure scope`	~0.70
B (academic phrasing)	`transformational leadership meta-analysis effect size`	~0.87
B (general phrasing)	`what makes a good leader`	~0.38
C (SaaS operational)	`series B saas magic number benchmark`	~0.23

Below ~0.3, an agent should treat aggregated claims as commercial positioning, not research.

`diversity_entropy`

Shannon entropy of the tier distribution. Low entropy = the SERP is dominated by a single tier. High entropy = the SERP spans many tiers.

Important: low entropy is not automatically bad. A SERP of 10/10 peer-reviewed papers has entropy 0.0 and that’s a feature. sift’s summary_hints suppress the “low diversity” warning when mean_authoritative_weight >= 0.7 precisely for this reason.

`vendor_dominance_ratio`

Fraction of results classified as vendor_primary or vendor_content_marketing. Not the same as “commercial ratio” — affiliate results aren’t counted here. See Summary hints for the hints that fire at 50%+ and at 90%+ without non-commercial alternatives.

Aggregate is over the full SERP, not the trimmed output

When max_results=5 is passed but sift fetched 10, the aggregate is computed over all 10. The agent should see the real landscape even when it asked for a smaller trimmed view. This is deliberate: aggregate metrics for too few results are noisy and misleading.