muster.

mission control for a federation of agents

The AI agent harness you can audit.

Agents waste 60–89% of what you spend on them. Muster puts every token on a ledger, every memory in a scoped lane, every learned skill behind an eval gate — across every chat surface, every model, any MCP server.

pnpm dlx @musterhq/cli init && muster demo

muster (v.) — to assemble troops into formation.

See it in your terminal

Run Muster. Watch every token.

No screenshots — this is the real CLI: the banner, a live muster demo run loop with scoped-memory recall and the token ledger, then the deterministic muster benchmark Token Waste Index.

Connect anywhere

01 One governed envelope. Every room.

A single message envelope fans out to wherever your people already are — same memory, same ledger, same approval gates in every surface. Add any channel in ~40 lines.

Any model, your keys

02 Cloud or local. Zero lock-in.

One routing layer over 20+ providers — bring your own keys, swap models per task, run fully offline. Governed fallback is recorded as evidence, never silent.

Connect any tool

03 The honest breadth multiplier.

Muster doesn't fake a hundred first-party integrations. Its reach is MCP — thousands of servers — plus typed capability packs and durable flows.

Any MCP server

Connect any of the thousands of MCP servers with per-server isolation — each gets its own scoped lane and approval gate. No shared blast radius.

Capability packs

Typed, versioned tool bundles. The Frappe / ERPNext pack ships today; the pack format is the community surface for everything next.

Flows

Durable, replayable automations with approval gates between steps. A run that fails mid-flight resumes from where it stopped — not from zero.

Why it's different

04 Every framework demos beautifully. Then production happens.

Governance is the moat — five promises kept on every single run, not a roadmap.

Token ledger

Every run priced and recorded. Replay waste flagged with the exact ratio. muster tokens.

Scoped memory

Five lanes — tenant, workspace, user, role, session — with promotion gates. Leakage is blocked by design.

Eval-gated skills

Feedback becomes a replayable test before it becomes behaviour. A skill ships only when the suite passes.

muster verify

Detects corrupt transcripts, duplicate runs, drift and poisoned context — four failure classes, every store.

Never-wedge compactor

Fits every turn to budget without losing the thread. The harness can't paint itself into a corner.

Proof

05 The Token Waste Index.

Deterministic measurement — no model is called. Across 5 realistic agent tasks (170 turns), a naive replay-everything harness sends 876k tokens; Muster sends 355k — a 59.4% reduction, up to 62.7% on the longest threads. Reproduce with muster benchmark.

scenarioturnsnaivemusterreduction
codebase-refactor-202184.6k42.7k49.6%
incident-triage-3031144.9k60.5k58.2%
erp-data-audit-4041205.5k79.5k61.3%
research-synthesis-2526160.0k67.7k57.7%
long-support-thread-5051280.8k104.9k62.7%
AGGREGATE170875.8k355.2k59.4%

Ecosystem

06 A platform, not a CLI.

One monorepo, one version — presented as the products it already is, plus themed satellites we're building. Military-assembly naming throughout.

shipped

Muster

The governed agent harness — core & CLI.

shipped

Garrison

One governed envelope for seven chat surfaces. @musterhq/gateway

shipped

Dispatch

Zero-dependency web client for any frontend. @musterhq/surface

shipped

Tally

The Token Waste Index — prove the savings. muster benchmark

v0

Frappe pack

ERPNext / Frappe capability pack — shipped.

planned

Roster

Capability-pack registry — roster install <pack>.

planned

Defector

Migrate from OpenClaw / Hermes into Muster, with verification.

planned

Recon

Standalone eval-suite runner — harness + config + model, not just the LLM.

planned

Picket

Local monitor / TUI over the gateway RPC — watch runs and the ledger live.

Build with us.

Everything is open source, MIT. Ship a capability pack and grow the surface.

pnpm dlx @musterhq/cli init && muster doctor