mcptest docs GitHub

Fuzzing, schema lint, and edge coverage

Three checks sit between the golden floor and the oracle-free hub: input fuzzing drives schema-derived malformed input and checks the server fails cleanly, the strict schema lint catches under-constrained schemas statically before they ever run, and tool-edge coverage gates an agent run against its declared tool edges. They are complementary: a tool that passes the lint is far less likely to crash under a fuzz sweep, and a restricted-edge attempt is also a security signal.

Input fuzzing

Status: implemented behind the preview schema flag.

Most tests check the happy path: given good arguments, expect a good result. A server also has to survive bad arguments. The MCP runtime-fault taxonomies (A Taxonomy of Runtime Faults in MCP Servers, arXiv:2606.05339; Real Faults in MCP Software, arXiv:2603.05637) found parameter and type-validation faults to be a recurring failure mode: a type mismatch, a missing required field, or a malformed structure that the server does not handle cleanly. The fuzzer exercises that surface.

A terminal session: mcptest fuzz drives schema-derived malformed input at every tool of the built-in evil mock, eight seeded cases each, and every tool stays well-behaved

It derives malformed argument cases from a tool's inputSchema, issues each one, and checks the server fails cleanly. The case generator runs no model and is deterministic: the same schema and seed produce the same cases, so a fuzz run is reproducible in CI.

In a suite

Add a fuzz: block to a tool entry in mcptest.yaml:

tools:
  - name: search survives a fuzz sweep
    server: api
    tool: search
    args: { query: "anthropic", limit: 10 }
    fuzz:
      seed: 1729
      cases: 64
      max_call_ms: 2000

Then run the whole suite, one report and one exit code:

mcptest run --config mcptest.yaml

No suite? Run it standalone

To fuzz every tool a server exposes without writing a suite:

mcptest fuzz --server-command "node ./dist/server.js"
mcptest fuzz --server-url https://api.example.com/mcp --seed 7 --cases 128

The subcommand lists the server's tools, fuzzes each from its advertised schema, and exits non-zero if any tool crashes, hangs, violates the protocol, or leaks.

The cases

From the schema (and, when no schema is advertised, from the shape of the base arguments) the generator builds:

The cases are enumerated in a stable order. When there are more than the cases budget, a seed-dependent window is taken, so a smaller budget still varies with the seed.

The oracle

The oracle is negative-path correctness, not a golden output. Each case is classified by how the call comes back:

Independently, a leak is flagged when an error response carries an internal detail (a stack trace, a source location, a secret-shaped token). The leak check is a conservative heuristic, so an ordinary "missing required argument" message does not trip it.

Assertable targets and the gate

The check exposes six targets. The names are exact.

TargetMeaning
fuzz.cases_runTotal cases dispatched.
fuzz.crashesCases that crashed the server or dropped the transport.
fuzz.hangsCases that did not return within max_call_ms.
fuzz.protocol_violationsCases that returned a malformed envelope.
fuzz.leaksCases whose error response leaked an internal detail.
fuzz.gate_passed1 when the report is clean, 0 otherwise.

The default gate (no expect:) fails on any crash, hang, protocol violation, or leak. Write an explicit expect: to assert a target directly.

The fuzzer checks that bad input fails cleanly, not that good input produces a correct result. It will not find a logic bug that returns a wrong-but-well-formed answer. Pair it with ordinary assertion tests for correctness and with the metamorphic relations for the oracle-free middle ground.

Strict input-schema lint

Status: implemented.

An under-constrained inputSchema lets malformed input reach the server, which the runtime-fault taxonomy (Real Faults in MCP Software, arXiv:2603.05637) ties to a class of parameter and type-validation faults. The fuzzer finds these at runtime; the schema lint catches them statically, which is cheaper.

The rules

Each rule inspects one tool's inputSchema and carries a stable id.

RuleSeverityWhat it flags
SCH-001warningthe object declares properties but no required list, so the server cannot rely on any argument being present
SCH-002warningadditionalProperties is not false, so unexpected fields are accepted silently
SCH-003criticala property declares neither type nor enum, so any value is accepted
SCH-004warninga string property has no maxLength, or an array property has no maxItems, so input size is unbounded

In a suite

The findings surface through the tool_quality: block as two assertable targets, alongside the existing description-quality targets:

tool_quality:
  - name: tool schemas are well constrained
    server: local
    expect:
      - target: schema_criticals
        matcher: { schema: { maximum: 0 } }
      - target: schema_warnings
        matcher: { schema: { maximum: 3 } }

These do not change the default tool_quality: gate; declare them explicitly to opt in. Run the whole suite with mcptest run --config mcptest.yaml.

No suite? Run it standalone

Run the lint and the autofix from the command line over a captured tools/list snapshot:

mcptest schema-lint tools.json                 # report findings, exit 1 if any
mcptest schema-lint tools.json --fix           # print the tightened catalog
mcptest schema-lint tools.json --fix --write   # tighten the snapshot in place

mcptest schema-lint is the standalone surface; the same lint also runs inside a suite's tool_quality: check. See the CLI reference for every flag. (This is distinct from mcptest lint, which scans suites for deprecated MCP features.)

The autofix

The lint ships a mechanical fix. Given an under-constrained schema it returns a tightened copy that sets additionalProperties: false and adds a required list of every declared property, applied recursively to nested object schemas. It deliberately does not invent a type, a maxLength, or a maxItems, since the right value is the author's to choose, so SCH-003 and SCH-004 stay findings rather than guesses. The examples/tool-schema-lint directory pins a loose schema and its tightened output together with a byte-for-byte test.

The lint is structural: it checks that a schema constrains its inputs, not that the constraints are semantically right. A maxLength of one million still passes SCH-004. Pair it with the fuzzer, which exercises the actual runtime handling the schema describes.

Tool-edge coverage

Status: implemented behind the preview schema flag.

End-to-end task success hides whether a declared access rule was actually exercised. An agent can pass its task and still have called a tool it was never supposed to touch, or never have exercised the tool you most wanted covered. Testing Agentic Workflows with Structural Coverage Criteria (Kahani, Bagherzadeh, 2026, arXiv:2605.26521) derives coverage obligations over the workflow's tool edges. The tool_edges: gate brings that to an agent test: it folds the run trace against a declared edge set into three deterministic numbers, with no model in the scoring.

The edges

In a suite

The gate lives on an agent entry in mcptest.yaml and exposes four targets, each usable in expect: with the standard matcher::

TargetMeaning
edges.allowed_pctPercent of allowed edges exercised.
edges.restricted_attemptsCount of calls to a restricted tool.
edges.delegation_pctPercent of delegation edges observed.
edges.gate_passed1 when no restricted tool was called, 0 otherwise.
agents:
  - name: triage agent stays within its allowed tools
    model: claude-sonnet-4-5
    servers: [repo]
    prompt: Find the open issues and summarize them.
    tool_edges:
      allowed: [search, summarize]
      restricted: [delete_repo, force_push]
      delegation: [{ from: planner, to: worker }]
      expect:
        - target: edges.restricted_attempts
          matcher: { schema: { maximum: 0 } }
        - target: edges.allowed_pct
          matcher: { schema: { minimum: 80 } }

Omit expect: to apply the default gate, which fails on any call to a restricted tool (edges.restricted_attempts <= 0). A restricted-edge attempt is also a security signal: a destructive tool the agent was told to avoid but reached for anyway. Run the whole suite with mcptest run --config mcptest.yaml.

The gate checks that the run stayed inside its declared edges, not that the declared edges are the right ones. It is structural coverage, not correctness. Pair it with ordinary agent assertions on the final answer, and with the narrative-vs-trace check so the agent's story matches the calls the coverage counted.