Input fuzzing
Status: implemented behind the preview schema flag. Tracked as epic WOR-1236 and child WOR-1238.
Most tests check the happy path: given good arguments, expect a good result. A server also has to survive bad arguments. The MCP runtime-fault taxonomies (A Taxonomy of Runtime Faults in MCP Servers, arXiv:2606.05339; Real Faults in MCP Software, arXiv:2603.05637) found parameter and type-validation faults to be a recurring failure mode: a type mismatch, a missing required field, or a malformed structure that the server does not handle cleanly. The fuzzer exercises that surface.
It derives malformed argument cases from a tool's inputSchema, issues each one, and checks the server fails cleanly. The case generator runs no model and is deterministic: the same schema and seed produce the same cases, so a fuzz run is reproducible in CI.
The cases
From the schema (and, when no schema is advertised, from the shape of the base arguments) the generator builds:
- omit a required field, for each field the schema marks required.
- wrong type, setting a field to a value of a different type than declared.
- null, setting a field to null.
- oversize, a very long string or a very large array for a field of that type, and an extreme integer for a numeric field.
- structural, a non-object as the whole arguments value, an empty object, and an unexpected field when
additionalPropertiesis false.
The cases are enumerated in a stable order. When there are more than the cases budget, a seed-dependent window is taken, so a smaller budget still varies with the seed.
The oracle
The oracle is negative-path correctness, not a golden output. Each case is classified by how the call comes back:
- clean: a well-formed JSON-RPC error (the server rejected bad input) or a valid result (the server accepted it). Both are fine.
- crash: the transport dropped or the server died (a closed pipe, a panic).
- hang: the call did not return within
max_call_ms. - protocol_violation: the response was a malformed or unparseable JSON-RPC envelope.
Independently, a leak is flagged when an error response carries an internal detail (a stack trace, a source location, a secret-shaped token). The leak check is a conservative heuristic, so an ordinary "missing required argument" message does not trip it.
Assertable targets and the gate
The check exposes six targets. The names are exact.
| Target | Meaning |
|---|---|
fuzz.cases_run | Total cases dispatched. |
fuzz.crashes | Cases that crashed the server or dropped the transport. |
fuzz.hangs | Cases that did not return within max_call_ms. |
fuzz.protocol_violations | Cases that returned a malformed envelope. |
fuzz.leaks | Cases whose error response leaked an internal detail. |
fuzz.gate_passed | 1 when the report is clean, 0 otherwise. |
The default gate (no expect:) fails on any crash, hang, protocol violation, or leak. Write an explicit expect: to assert a target directly.
tools:
- name: search survives a fuzz sweep
server: api
tool: search
args: { query: "anthropic", limit: 10 }
fuzz:
seed: 1729
cases: 64
max_call_ms: 2000
The subcommand
To fuzz every tool a server exposes without writing a suite:
mcptest fuzz --server-command "node ./dist/server.js"
mcptest fuzz --server-url https://api.example.com/mcp --seed 7 --cases 128
The subcommand lists the server's tools, fuzzes each from its advertised schema, and exits non-zero if any tool crashes, hangs, violates the protocol, or leaks.
What it does not do
The fuzzer checks that bad input fails cleanly, not that good input produces a correct result. It will not find a logic bug that returns a wrong-but-well-formed answer. Pair it with ordinary assertion tests for correctness and with the metamorphic relations for the oracle-free middle ground.