mission control for a federation of agents

The AI agent harness you can audit.

Agents waste 60–89% of what you spend on them. Muster puts every token on a ledger, every memory in a scoped lane, every learned skill behind an eval gate — across every chat surface, every model, any MCP server.

pnpm dlx @musterhq/cli init && muster demo

Get started View on GitHub →

muster (v.) — to assemble troops into formation.

See it in your terminal

Run Muster. Watch every token.

No screenshots — this is the real CLI: the banner, a live muster demo run loop with scoped-memory recall and the token ledger, then the deterministic muster benchmark Token Waste Index.

muster — zsh — 92×24

███╗   ███╗██╗   ██╗███████╗████████╗███████╗██████╗ ████╗ ████║██║   ██║██╔════╝╚══██╔══╝██╔════╝██╔══██╗██╔████╔██║██║   ██║███████╗   ██║   █████╗  ██████╔╝██║╚██╔╝██║██║   ██║╚════██║   ██║   ██╔══╝  ██╔══██╗██║ ╚═╝ ██║╚██████╔╝███████║   ██║   ███████╗██║  ██║╚═╝     ╚═╝ ╚═════╝ ╚══════╝   ╚═╝   ╚══════╝╚═╝  ╚═╝  the agent harness you can audit $ muster demomuster demo — provisioned an isolated workspace and a live stub model service. > Where do we deploy?  (recalled 1 scoped memory)  Muster deploys to uat-erp.example.com (recalled from scoped memory). > Summarize the day's work.  Demo run complete. Every token above is real, recorded to the ledger. run            model                          in       out     est  cost$    waste   session------------------------------------------------------------------------------------------------run_2f9c41a8   demo/demo-model                48.1k    412     ~    0.0061   6.2x !  oneshotrun_5b1e07d3   demo/demo-model                1.8k     128     ~    0.0004   -      oneshot totals by model              runs   in         out        cost$      waste-runs--------------------------------------------------------------------------------demo/demo-model              2      49.9k      540        0.0065     1 ! integrity check: OK — 2 runs, 0 corrupt, 0 duplicate, 0 drift, 0 poisoned.That was a real run loop: scoped memory recall, token ledger, integrity verification. $ muster benchmarkscenario                          turns  naive    muster   reduction  replay-overhead--------------------------------------------------------------------------------------codebase-refactor-20              21     84.6k    42.7k    49.6%      90.5%incident-triage-30                31     144.9k   60.5k    58.2%      93.6%erp-data-audit-40                 41     205.5k   79.5k    61.3%      95.1%research-synthesis-25             26     160.0k   67.7k    57.7%      92.3%long-support-thread-50            51     280.8k   104.9k   62.7%      96.1%--------------------------------------------------------------------------------------AGGREGATE                         170    875.8k   355.2k   59.4%      94.2% Muster reduced naive token cost by 59.4% across these scenarios.Deterministic — no model calls.$ █

Connect anywhere

01 One governed envelope. Every room.

A single message envelope fans out to wherever your people already are — same memory, same ledger, same approval gates in every surface. Add any channel in ~40 lines.

Telegram
Slack
Discord
WhatsApp
Google Chat
MS Teams
WebChat

Any model, your keys

02 Cloud or local. Zero lock-in.

One routing layer over 20+ providers — bring your own keys, swap models per task, run fully offline. Governed fallback is recorded as evidence, never silent.

Claude (Fable 5)
OpenAI
Gemini
Grok
Kimi
DeepSeek
Mistral
GLM
Perplexity
Qwen
Groq
Cerebras
OpenRouter
Together
Fireworks
LM Studio
vLLM
SGLang
Codex CLI

Connect any tool

03 The honest breadth multiplier.

Muster doesn't fake a hundred first-party integrations. Its reach is MCP — thousands of servers — plus typed capability packs and durable flows.

Any MCP server

Connect any of the thousands of MCP servers with per-server isolation — each gets its own scoped lane and approval gate. No shared blast radius.

Capability packs

Typed, versioned tool bundles. The Frappe / ERPNext pack ships today; the pack format is the community surface for everything next.

Flows

Durable, replayable automations with approval gates between steps. A run that fails mid-flight resumes from where it stopped — not from zero.

Why it's different

04 Every framework demos beautifully. Then production happens.

Governance is the moat — five promises kept on every single run, not a roadmap.

Token ledger

Every run priced and recorded. Replay waste flagged with the exact ratio. muster tokens.

Scoped memory

Five lanes — tenant, workspace, user, role, session — with promotion gates. Leakage is blocked by design.

Eval-gated skills

Feedback becomes a replayable test before it becomes behaviour. A skill ships only when the suite passes.

muster verify

Detects corrupt transcripts, duplicate runs, drift and poisoned context — four failure classes, every store.

Never-wedge compactor

Fits every turn to budget without losing the thread. The harness can't paint itself into a corner.

Proof

05 The Token Waste Index.

Deterministic measurement — no model is called. Across 5 realistic agent tasks (170 turns), a naive replay-everything harness sends 876k tokens; Muster sends 355k — a 59.4% reduction, up to 62.7% on the longest threads. Reproduce with muster benchmark.

scenario	turns	naive	muster	reduction
codebase-refactor-20	21	84.6k	42.7k	49.6%
incident-triage-30	31	144.9k	60.5k	58.2%
erp-data-audit-40	41	205.5k	79.5k	61.3%
research-synthesis-25	26	160.0k	67.7k	57.7%
long-support-thread-50	51	280.8k	104.9k	62.7%
AGGREGATE	170	875.8k	355.2k	59.4%

Read the full methodology & reproduce →

Ecosystem

06 A platform, not a CLI.

One monorepo, one version — presented as the products it already is, plus themed satellites we're building. Military-assembly naming throughout.

shipped

Muster

The governed agent harness — core & CLI.

shipped

Garrison

One governed envelope for seven chat surfaces. @musterhq/gateway

shipped

Dispatch

Zero-dependency web client for any frontend. @musterhq/surface

shipped

Tally

The Token Waste Index — prove the savings. muster benchmark

Frappe pack

ERPNext / Frappe capability pack — shipped.

planned

Roster

Capability-pack registry — roster install <pack>.

planned

Defector

Migrate from OpenClaw / Hermes into Muster, with verification.

planned

Recon

Standalone eval-suite runner — harness + config + model, not just the LLM.

planned

Picket

Local monitor / TUI over the gateway RPC — watch runs and the ledger live.

Build with us.

Everything is open source, MIT. Ship a capability pack and grow the surface.

pnpm dlx @musterhq/cli init && muster doctor

Get started Read the docs →