ReasoningReceipt ·oracle

The five agents

Each market goes through a structured debate. Three sub-researchers run in parallel with isolated context — they don't see each other's drafts. A Supervisor merges with a weighted-Bayesian rule + mandates a falsifiable claim. A Critic audits the result across six rigor dimensions; if any dim falls below 0.4 the Supervisor re-runs once with the critic's feedback inlined. Receipts that fail audit on the second pass never reach the chain.

Pipeline

     scanner — Polymarket Gamma poll
          │
          ▼
   ┌──────┼──────┐
   ▼      ▼      ▼
 [Bull]  [Bear]  [Edge]       ← parallel, isolated context, ~3s per stance
   │      │      │
   └──────┼──────┘
          ▼
     [Supervisor]              ← weighted-Bayesian merge, mandates falsifiable claim,
          │                      consumes calibration_prior from past Brier
          ▼
       [Critic]                ← 6-dim audit; verdict ∈ {approved, needs_revision, rejected}
          │
     ┌────┴────┐
     │ needs_  │ approved │ rejected
     │revision │          │
     ▼         ▼          ▼
   Supervisor  emit on    SKIP — no on-chain commit,
   re-runs    Arc V2 +    no calibration noise
   once       Irys upload

Agent cards

Bull
Argue YES — strongest defensible case
Model
Gemini 3.1 Pro Preview (Vertex AI, global region)
Grounding
Google Search at request time
Context isolation
Sees only the market prompt — never Bear or Edge's drafts
Output
probability_estimate ≥ 0.55, key factors, ≥ 2 cited evidence URLs
Bear
Argue NO — strongest defensible case
Model
Gemini 3.1 Pro Preview (Vertex AI, global region)
Grounding
Google Search at request time
Context isolation
Same — opposite advocate, independent context
Output
probability_estimate ≤ 0.45, key factors, ≥ 2 cited evidence URLs
Edge
Surface tail risks both partisans miss
Model
Gemini 3.1 Pro Preview (Vertex AI, global region)
Grounding
Google Search at request time
Context isolation
Same — adversarial-to-conventional-wisdom, independent context
Output
Tail-risk factors, structural assumptions, ≥ 1 historical analog
Supervisor
Weighted-Bayesian merge of three stances
Model
Gemini 3.1 Pro Preview, low temperature (0.2)
Grounding
None — synthesises drafts, no fresh search
Context isolation
Reads all three drafts. Cannot reach back to a stance for clarification.
Output
final probability + confidence, stance weights ∈ [0.1, 0.7] summing to 1.0, disagreement_pp, mandatory ≥ 1 falsifiable claim with checkable_by date, calibration_prior_used
Critic
Audit the merged trace across 6 rigor dimensions
Model
Gemini 3 Flash Preview (smaller, faster, cheaper)
Grounding
None — reads only the trace under audit
Context isolation
Single pass. Returns verdict: approved / needs_revision / rejected.
Output
Per-dim score [0, 1]: evidence_relevance, falsifiability, scope, coherence, exploration_integrity, methodology. Rule overrides model self-report.

Multi-model fallback

Every Gemini call routes through a fallback chain. When the primary 429s — usually Pro Preview hitting the free-tier quota mid-tick — the wrapper retries the next model in the chain transparently. The fallback has fired hundreds of times in production today, keeping the loop emitting receipts without any operator intervention.

Stance + Supervisor: gemini-3.1-pro-preview → gemini-3-flash-preview → gemini-2.5-flash
Critic:              gemini-3-flash-preview → gemini-2.5-flash → gemini-2.5-flash-lite

Watch them debate in real time

The home page has a live SSE feed. The v3 pill on a row tells you the receipt came out of a 5-agent debate; hover to see the Bull/Bear/Edge disagreement in percentage points. Click any v3 row → see the full ensemble panel, the critic radar, and the falsifiable claims the supervisor committed to.