Input fuzzing

Status: implemented behind the preview schema flag. Tracked as epic WOR-1236 and child WOR-1238.

Most tests check the happy path: given good arguments, expect a good result. A server also has to survive bad arguments. The MCP runtime-fault taxonomies (A Taxonomy of Runtime Faults in MCP Servers, arXiv:2606.05339; Real Faults in MCP Software, arXiv:2603.05637) found parameter and type-validation faults to be a recurring failure mode: a type mismatch, a missing required field, or a malformed structure that the server does not handle cleanly. The fuzzer exercises that surface.

A terminal session: mcptest fuzz drives schema-derived malformed input at every tool of the built-in evil mock, eight seeded cases each, and every tool stays well-behaved

It derives malformed argument cases from a tool's inputSchema, issues each one, and checks the server fails cleanly. The case generator runs no model and is deterministic: the same schema and seed produce the same cases, so a fuzz run is reproducible in CI.

The cases

From the schema (and, when no schema is advertised, from the shape of the base arguments) the generator builds:

omit a required field, for each field the schema marks required.
wrong type, setting a field to a value of a different type than declared.
null, setting a field to null.
oversize, a very long string or a very large array for a field of that type, and an extreme integer for a numeric field.
structural, a non-object as the whole arguments value, an empty object, and an unexpected field when additionalProperties is false.

The cases are enumerated in a stable order. When there are more than the cases budget, a seed-dependent window is taken, so a smaller budget still varies with the seed.

The oracle

The oracle is negative-path correctness, not a golden output. Each case is classified by how the call comes back:

clean: a well-formed JSON-RPC error (the server rejected bad input) or a valid result (the server accepted it). Both are fine.
crash: the transport dropped or the server died (a closed pipe, a panic).
hang: the call did not return within max_call_ms.
protocol_violation: the response was a malformed or unparseable JSON-RPC envelope.

Independently, a leak is flagged when an error response carries an internal detail (a stack trace, a source location, a secret-shaped token). The leak check is a conservative heuristic, so an ordinary "missing required argument" message does not trip it.

Assertable targets and the gate

The check exposes six targets. The names are exact.

Target	Meaning
`fuzz.cases_run`	Total cases dispatched.
`fuzz.crashes`	Cases that crashed the server or dropped the transport.
`fuzz.hangs`	Cases that did not return within `max_call_ms`.
`fuzz.protocol_violations`	Cases that returned a malformed envelope.
`fuzz.leaks`	Cases whose error response leaked an internal detail.
`fuzz.gate_passed`	1 when the report is clean, 0 otherwise.

The default gate (no expect:) fails on any crash, hang, protocol violation, or leak. Write an explicit expect: to assert a target directly.

tools:
  - name: search survives a fuzz sweep
    server: api
    tool: search
    args: { query: "anthropic", limit: 10 }
    fuzz:
      seed: 1729
      cases: 64
      max_call_ms: 2000

The subcommand

To fuzz every tool a server exposes without writing a suite:

mcptest fuzz --server-command "node ./dist/server.js"
mcptest fuzz --server-url https://api.example.com/mcp --seed 7 --cases 128

The subcommand lists the server's tools, fuzzes each from its advertised schema, and exits non-zero if any tool crashes, hangs, violates the protocol, or leaks.

What it does not do

The fuzzer checks that bad input fails cleanly, not that good input produces a correct result. It will not find a logic bug that returns a wrong-but-well-formed answer. Pair it with ordinary assertion tests for correctness and with the metamorphic relations for the oracle-free middle ground.