Scenario 8: catch schema drift
A consumer depends on an MCP server's tool catalog. The server owner ships a release, and a tool quietly disappears, an argument that used to be optional becomes required, or an enum value gets dropped. Each of those breaks callers the moment they hit the new surface, but none of them show up as a failed unit test on the server side. They are catalog changes, not behavior bugs.
mcptest diff is built for exactly this. You capture the catalog you depend on as a baseline (a saved tools/list snapshot), then diff a new catalog against it. The command classifies every change as breaking or non-breaking and sets its exit code so CI fails loudly on a regression.
This walkthrough uses the hosted test server at https://test.mcptest.sh. Its conformant endpoint is POST https://test.mcptest.sh/mcp. A second endpoint, POST https://test.mcptest.sh/mcp?catalog=v1, serves the prior catalog, so you can diff the two and watch the breaking changes surface.
Capture and diff
The diff command compares two saved tools/list JSON snapshots. The snapshot shape is a single object with a tools array, the same shape the server returns from tools/list. The committed pair examples/diff-tools-baseline.json and examples/diff-tools-current.json is a ready-made example you can diff with no network at all:
mcptest diff examples/diff-tools-baseline.json examples/diff-tools-current.json
For the hosted-server walkthrough, capture each endpoint's catalog into its own snapshot file. mcptest discover runs the handshake and tools/list for you; save the tools/list result for each endpoint as <name>.json. The two snapshots below stand in for "the prior release" and "the current release":
prior.json: the catalog fromhttps://test.mcptest.sh/mcp?catalog=v1. It still has thearchive_itemtool,search.queryis optional, and thekelvinunits enum value is present.current.json: the catalog fromhttps://test.mcptest.sh/mcp. The current conformant tools aregreet(name),search(query),get_forecast(city),list_items(cursor?),slow_op(delay_ms?),fail(code?), anddelete_record(id).archive_itemis gone,search.queryis now required, andkelvinis no longer a valid units value.
A snapshot is just the tools/list object. A trimmed prior.json looks like:
{
"tools": [
{ "name": "archive_item", "description": "Archive an item by id.",
"inputSchema": { "type": "object",
"properties": { "id": { "type": "string" } },
"required": ["id"] } },
{ "name": "search", "description": "Search the catalog.",
"inputSchema": { "type": "object",
"properties": { "query": { "type": "string" } },
"required": [] } },
{ "name": "get_forecast", "description": "Forecast for a city.",
"inputSchema": { "type": "object",
"properties": {
"city": { "type": "string" },
"units": { "type": "string", "enum": ["celsius", "fahrenheit", "kelvin"] }
},
"required": ["city"] } }
]
}
The matching current.json drops archive_item, moves query into search's required array, and removes kelvin from the units enum.
Diff the prior catalog (old) against the current catalog (new):
mcptest diff prior.json current.json
The first argument is the baseline (old), the second is the candidate (new). Order matters: the diff describes how to get from old to new, so passing them backwards reports an added tool and a relaxed argument instead of the breakage you are looking for.
To gate CI, leave --fail-on-breaking at its default (true) so any breaking change exits non-zero. Add --scorecard for a release letter grade, and pick a machine format with --format when a downstream tool consumes the output:
# CI gate: non-zero exit on any breaking change (the default).
mcptest diff prior.json current.json --fail-on-breaking true
# Advisory PR comment that never fails the job.
mcptest diff prior.json current.json --format markdown --fail-on-breaking false > pr-comment.md
# Append a release scorecard (A+ / A / B / C / D / F).
mcptest diff prior.json current.json --scorecard
What is happening here:
- The snapshot is the
tools/listobject ({ "tools": [...] }). Capture one per endpoint and keep the baseline you depend on under source control so it moves only through deliberate review. mcptest diff <OLD> <NEW>classifies each change by severity. A removed tool, an optional argument becoming required, and a removed enum value are all breaking, because each one breaks a caller that worked against the old catalog.--fail-on-breaking true(the default) makes the command exit1when at least one change is breaking. That is the signal CI keys on. Set it tofalsefor advisory output that reports the diff without failing the job.--scorecardrolls the diff into one letter grade. Any tool removed between old and new grades the releaseF, since a removed tool is the most disruptive change a server can ship.--formatselects the reporter.pretty(default) is for terminals;markdownis for a PR comment;jsonis for downstream tooling.
Expected output
Diffing the prior catalog against the current one reports three breaking changes and exits non-zero:
$ mcptest diff prior.json current.json
Tool catalog diff: prior.json -> current.json
Tools removed (1):
- archive_item (BREAKING)
last seen with: args.id (string, required)
Tools changed (2):
search
args.query: optional -> required (BREAKING)
get_forecast
args.units: enum value `kelvin` removed (BREAKING)
Summary: 3 BREAKING, 0 NON-BREAKING.
Exit code: 1
With --scorecard appended, the diff gains a grade line. A removed tool grades the release F:
Release scorecard: F
removed: archive_item
regressed: search (query now required), get_forecast (units enum narrowed)
The exit code is the load-bearing CI signal. 0 means no breaking changes (or --fail-on-breaking false); 1 means at least one breaking change was found, or a snapshot file was missing or malformed. A CI step that runs mcptest diff against the committed baseline fails the build the moment a breaking catalog change lands.
Troubleshooting
- Diff reports an added tool and a relaxed argument instead of breakage. The arguments are reversed. The first path is the old baseline, the second is the new candidate. Swap them so the diff reads old to new.
mcptest diffexits1but the output shows no diff. Exit1also covers a missing or malformed snapshot file. Confirm both paths exist and that each file is a validtools/listobject ({ "tools": [ ... ] }).- The diff flags changes you did not make. A snapshot captured from a live server can carry environment-specific or reordered fields. Capture both snapshots the same way (same
mcptest discoverinvocation against each endpoint) so only real catalog changes remain. - You want the diff to report but not fail the job. Pass
--fail-on-breaking false. The diff still prints every change; the command exits0so a PR-comment step does not block the merge. - The hosted endpoint is unreachable from CI. The
?catalog=v1and base endpoints both live behindhttps://test.mcptest.sh. If your CI network blocks outbound HTTPS, capture both snapshots once, commit them, and diff the committed files (no network at diff time).
See also
docs/schema-evolution-diff.md, the full severity classification table and the JSON / markdown output shapes.- Previous: LLM-judge preview.
- Next: Scan for attacks.