The five agents
Each market goes through a structured debate. Three sub-researchers run in parallel with isolated context — they don't see each other's drafts. A Supervisor merges with a weighted-Bayesian rule + mandates a falsifiable claim. A Critic audits the result across six rigor dimensions; if any dim falls below 0.4 the Supervisor re-runs once with the critic's feedback inlined. Receipts that fail audit on the second pass never reach the chain.
Pipeline
scanner — Polymarket Gamma poll
│
▼
┌──────┼──────┐
▼ ▼ ▼
[Bull] [Bear] [Edge] ← parallel, isolated context, ~3s per stance
│ │ │
└──────┼──────┘
▼
[Supervisor] ← weighted-Bayesian merge, mandates falsifiable claim,
│ consumes calibration_prior from past Brier
▼
[Critic] ← 6-dim audit; verdict ∈ {approved, needs_revision, rejected}
│
┌────┴────┐
│ needs_ │ approved │ rejected
│revision │ │
▼ ▼ ▼
Supervisor emit on SKIP — no on-chain commit,
re-runs Arc V2 + no calibration noise
once Irys uploadAgent cards
- Model
- Gemini 3.1 Pro Preview (Vertex AI, global region)
- Grounding
- Google Search at request time
- Context isolation
- Sees only the market prompt — never Bear or Edge's drafts
- Output
- probability_estimate ≥ 0.55, key factors, ≥ 2 cited evidence URLs
- Model
- Gemini 3.1 Pro Preview (Vertex AI, global region)
- Grounding
- Google Search at request time
- Context isolation
- Same — opposite advocate, independent context
- Output
- probability_estimate ≤ 0.45, key factors, ≥ 2 cited evidence URLs
- Model
- Gemini 3.1 Pro Preview (Vertex AI, global region)
- Grounding
- Google Search at request time
- Context isolation
- Same — adversarial-to-conventional-wisdom, independent context
- Output
- Tail-risk factors, structural assumptions, ≥ 1 historical analog
- Model
- Gemini 3.1 Pro Preview, low temperature (0.2)
- Grounding
- None — synthesises drafts, no fresh search
- Context isolation
- Reads all three drafts. Cannot reach back to a stance for clarification.
- Output
- final probability + confidence, stance weights ∈ [0.1, 0.7] summing to 1.0, disagreement_pp, mandatory ≥ 1 falsifiable claim with checkable_by date, calibration_prior_used
- Model
- Gemini 3 Flash Preview (smaller, faster, cheaper)
- Grounding
- None — reads only the trace under audit
- Context isolation
- Single pass. Returns verdict: approved / needs_revision / rejected.
- Output
- Per-dim score [0, 1]: evidence_relevance, falsifiability, scope, coherence, exploration_integrity, methodology. Rule overrides model self-report.
Multi-model fallback
Every Gemini call routes through a fallback chain. When the primary 429s — usually Pro Preview hitting the free-tier quota mid-tick — the wrapper retries the next model in the chain transparently. The fallback has fired hundreds of times in production today, keeping the loop emitting receipts without any operator intervention.
Stance + Supervisor: gemini-3.1-pro-preview → gemini-3-flash-preview → gemini-2.5-flash Critic: gemini-3-flash-preview → gemini-2.5-flash → gemini-2.5-flash-lite
Watch them debate in real time
The home page has a live SSE feed. The v3 pill on a row tells you the receipt came out of a 5-agent debate; hover to see the Bull/Bear/Edge disagreement in percentage points. Click any v3 row → see the full ensemble panel, the critic radar, and the falsifiable claims the supervisor committed to.