Any MCP server
Connect any of the thousands of MCP servers with per-server isolation — each gets its own scoped lane and approval gate. No shared blast radius.
mission control for a federation of agents
Agents waste 60–89% of what you spend on them. Muster puts every token on a ledger, every memory in a scoped lane, every learned skill behind an eval gate — across every chat surface, every model, any MCP server.
pnpm dlx @musterhq/cli init && muster demo
muster (v.) — to assemble troops into formation.
See it in your terminal
No screenshots — this is the real CLI: the banner, a live muster demo run loop with scoped-memory recall and the token ledger, then the deterministic muster benchmark Token Waste Index.
the agent harness you can audit $ muster demomuster demo — provisioned an isolated workspace and a live stub model service. > Where do we deploy? (recalled 1 scoped memory) Muster deploys to uat-erp.example.com (recalled from scoped memory). > Summarize the day's work. Demo run complete. Every token above is real, recorded to the ledger. run model in out est cost$ waste session------------------------------------------------------------------------------------------------run_2f9c41a8 demo/demo-model 48.1k 412 ~ 0.0061 6.2x ! oneshotrun_5b1e07d3 demo/demo-model 1.8k 128 ~ 0.0004 - oneshot totals by model runs in out cost$ waste-runs--------------------------------------------------------------------------------demo/demo-model 2 49.9k 540 0.0065 1 ! integrity check: OK — 2 runs, 0 corrupt, 0 duplicate, 0 drift, 0 poisoned.That was a real run loop: scoped memory recall, token ledger, integrity verification. $ muster benchmarkscenario turns naive muster reduction replay-overhead--------------------------------------------------------------------------------------codebase-refactor-20 21 84.6k 42.7k 49.6% 90.5%incident-triage-30 31 144.9k 60.5k 58.2% 93.6%erp-data-audit-40 41 205.5k 79.5k 61.3% 95.1%research-synthesis-25 26 160.0k 67.7k 57.7% 92.3%long-support-thread-50 51 280.8k 104.9k 62.7% 96.1%--------------------------------------------------------------------------------------AGGREGATE 170 875.8k 355.2k 59.4% 94.2% Muster reduced naive token cost by 59.4% across these scenarios.Deterministic — no model calls.$
Connect anywhere
A single message envelope fans out to wherever your people already are — same memory, same ledger, same approval gates in every surface. Add any channel in ~40 lines.
Any model, your keys
One routing layer over 20+ providers — bring your own keys, swap models per task, run fully offline. Governed fallback is recorded as evidence, never silent.
Connect any tool
Muster doesn't fake a hundred first-party integrations. Its reach is MCP — thousands of servers — plus typed capability packs and durable flows.
Connect any of the thousands of MCP servers with per-server isolation — each gets its own scoped lane and approval gate. No shared blast radius.
Typed, versioned tool bundles. The Frappe / ERPNext pack ships today; the pack format is the community surface for everything next.
Durable, replayable automations with approval gates between steps. A run that fails mid-flight resumes from where it stopped — not from zero.
Why it's different
Governance is the moat — five promises kept on every single run, not a roadmap.
Every run priced and recorded. Replay waste flagged with the exact ratio. muster tokens.
Five lanes — tenant, workspace, user, role, session — with promotion gates. Leakage is blocked by design.
Feedback becomes a replayable test before it becomes behaviour. A skill ships only when the suite passes.
Detects corrupt transcripts, duplicate runs, drift and poisoned context — four failure classes, every store.
Fits every turn to budget without losing the thread. The harness can't paint itself into a corner.
Proof
Deterministic measurement — no model is called. Across 5 realistic agent tasks (170 turns), a naive replay-everything harness sends 876k tokens; Muster sends 355k — a 59.4% reduction, up to 62.7% on the longest threads. Reproduce with muster benchmark.
| scenario | turns | naive | muster | reduction |
|---|---|---|---|---|
| codebase-refactor-20 | 21 | 84.6k | 42.7k | 49.6% |
| incident-triage-30 | 31 | 144.9k | 60.5k | 58.2% |
| erp-data-audit-40 | 41 | 205.5k | 79.5k | 61.3% |
| research-synthesis-25 | 26 | 160.0k | 67.7k | 57.7% |
| long-support-thread-50 | 51 | 280.8k | 104.9k | 62.7% |
| AGGREGATE | 170 | 875.8k | 355.2k | 59.4% |
Ecosystem
One monorepo, one version — presented as the products it already is, plus themed satellites we're building. Military-assembly naming throughout.
The governed agent harness — core & CLI.
One governed envelope for seven chat surfaces. @musterhq/gateway
Zero-dependency web client for any frontend. @musterhq/surface
The Token Waste Index — prove the savings. muster benchmark
ERPNext / Frappe capability pack — shipped.
Capability-pack registry — roster install <pack>.
Migrate from OpenClaw / Hermes into Muster, with verification.
Standalone eval-suite runner — harness + config + model, not just the LLM.
Local monitor / TUI over the gateway RPC — watch runs and the ledger live.
Everything is open source, MIT. Ship a capability pack and grow the surface.
pnpm dlx @musterhq/cli init && muster doctor