CCBC/COMMAND CENTER
SAME-ORIGIN APIT— --:--:-- UTCROUTE /api/cbc
Mission brief — 001

Ship AI code
without the babysitting.

CBCis a verification-first control plane for AI code generation. Every attempt is sandboxed, every claim is checked, every verdict is reproducible. No “looks good to me.”
Verified
Success
Latest
LIVE249 TESTS PASS27 RUNS TODAYVERIFIED RATE 94%LAST VERDICT · VERIFIEDUPTIME 99.8%SANDBOX NOMINAL7 AGENTS ONLINEP95 LATENCY 1.2sQUEUE DEPTH 3
Operator focus

Watch runs, inspect verdicts, and verify the wiring before you trust the dashboard.

This surface now prefers honest fallbacks: same-origin API proxying, explicit Supabase status, and structured run details instead of fake “online” theater.

Quickstart
Replay demo./scripts/run_compare.sh
Run one taskuv run cbc run fixtures/oracle_tasks/calculator_bug/task.yaml
Zero-config solveuv run cbc solve "Fix the failing tests" --json
API proxy
probing
Checking /api/cbc/health now.
Supabase mirror
configured
Fleet-wide KPIs and event history can hydrate from mirrored run data.
Deploy mode
vercel-safe
Browser traffic stays same-origin and the server-side proxy owns the backend URL.
The promise — 00

Four things you’ll never do again.

No. 00
NoCode review

Every patch runs the full oracle suite before it lands.

No. 01
NoMerge conflicts

PR-gated push-forward auto-rebases and resolves safe classes.

No. 02
NoBroken tests

No VERIFIED verdict, no merge. The gate is deterministic.

No. 03
NoTime wasted debugging

Failure context feeds the next attempt — bounded, then aborted.

Signal floor

Verified rate
0 of 0 runs
Runs · last 24h
0 total in ledger
Tokens spent
Cumulative across all attempts
Avg cost / run
$0.00 total

Every verdict is a proof

Loading ledger…
VERIFIED#a1f2c3dFALSIFIED#b7c8d9eVERIFIED#c2e3f4aVERIFIED#d5a6b7cTIMED_OUT#e8f9a0bVERIFIED#f1b2c3dUNPROVEN#a4b5c6dVERIFIED#b7e8f9a

Live event tail

Event streamREALTIME
No events yet.
Events stream in as runs progress: attempts, verify checks, verdicts.

Agents working for you

Remediation feedAUTONOMOUS
No remediations yet.
When a run falsifies, an agent will pick it up — results appear here.

What a run can prove

oracle
Task-defined shell/pytest/python command. The authoritative pass/fail.
core
pytest
Full test suite runner with structured pass/fail attribution per node.
python
type
Type-checker gate (mypy/pyright). Optional, configurable command.
python
lint
Syntax/compile check via compileall by default; pluggable linter command.
python
coverage
Coverage threshold gate. Rejects regressions below the configured floor.
python
mutation
Mutation testing. Injects faults, demands the test suite catches them.
python
crosshair
Symbolic execution via CrossHair. Surfaces counterexamples from contracts.
symbolic
hypothesis
Property-based testing. Generative counterexamples with shrinking.
property
structural
File tree and schema shape validation. Catches drift before semantics.
meta

Tools at hand

TOOL · 01R
cbc run
Execute a task against a sandboxed workspace; iterate until the verification oracle certifies or aborts.
--mode--controller--sandbox--agent
TOOL · 02S
cbc solve
Free-form prompt → synthetic task spec → bounded solve loop with auto-generated oracles.
--prompt--max-attempts
TOOL · 03C
cbc compare
A/B benchmark baseline vs. treatment across a curated oracle subset; emits delta report.
--baseline--treatment--seed
TOOL · 04V
cbc controller-compare
Race sequential vs. gearbox orchestration on the same task set. Latency + verdict stats.
TOOL · 05P
cbc poc
Live Codex sampling: seeded repeats, pairwise stats, confidence intervals.
--seed--sample-size
TOOL · 06E
cbc review
Generate a structured code-review report for a completed run ledger.
TOOL · 07E
cbc review-artifact
Read a stored run artifact and export the review as JSON.
TOOL · 08W
cbc review-workspace
Review an arbitrary workspace tree against a task spec, no run required.
TOOL · 09G
cbc ci
CI gate: task + workspace → verdict. Exit code reflects mergeability.
TOOL · 10G
cbc ci-artifact
Recompute a CI gate from a frozen artifact for deterministic audit trails.
TOOL · 11T
cbc trends
Aggregate ledger stats: success rate, avg attempts, cost curves over rolling windows.
TOOL · 12B
cbc benchmark-artifact
Replay a saved benchmark report and re-render the comparison metrics.
TOOL · 13A
cbc api
Start the FastAPI control plane: SSE streams, REST queries, Supabase mirror hook.
--host--port

Task fixtures

calculator_bug
python
Fix addition in calculator.py
REPLAY
calculator_bug_codex
python
Same, with live Codex
REPLAY◆ LIVE CODEX
live_codex_calculator
python
Alternate live-Codex calculator fixture
REPLAY◆ LIVE CODEX
checkout_tax_propagation
python
Propagate a taxed total signature across pricing and checkout
REPLAY
greeting_text_patch
text
Single-line greeting typo fix (golden fixture)
REPLAY
json_status_rollup
json
Repair a derived JSON summary (derived-state, golden)
REPLAY
price_format_property_regression
python
Fix price formatting and emit a regression artifact from a property case
REPLAY
shell_banner_contract
shell
Repair a shell-based verification banner (stdout + exit code contract)
REPLAY
slug_shell_bug
shell
Fix slug rendering for shell validator
REPLAY
slug_shell_bug_codex
shell
Same, with live Codex
REPLAY◆ LIVE CODEX
slugify_property_regression
python
Fix slugify and capture a regression test from a failing property case
REPLAY
slugify_property_regression_codex
python
Same, with live Codex
REPLAY◆ LIVE CODEX
status_badge_js_contract
javascript
Repair JS status badge labels (via Node)
REPLAY
title_case_bug
python
Fix title casing helper
REPLAY
title_case_bug_codex
python
Same, with live Codex
REPLAY◆ LIVE CODEX

How a task moves

Task
YAML spec
Plan
allowed files
Codex
generate
Sandbox
isolate
Verify
oracle + N
Verdict
ledger
RETRY LOOP
On FALSIFIED, route_after_verify feeds the counterexample back into the next attempt. Bounded by retry_budget.
GEARBOX
Parallel candidate fan-out under ConTree sandbox. The gearbox coordinator picks the first-verified winner.
LEDGER
Every attempt, verify check, and verdict lands in SQLite locally and mirrors to Supabase for this dashboard.