Model-compatibility baseline format
Reference for the on-disk shape of baselines/<model>.json. This is the JSON the mcptest model-baseline capture command writes and the mcptest model-baseline diff command reads. Other tools can consume the file directly; the schema is additive only.
Top-level fields
{
"schema": "https://mcptest.sh/schema/model-compat/v1.json",
"version": 1,
"model_id": "anthropic:claude-sonnet-4.5",
"model_version": "20251022",
"mcptest_version": "0.1.0",
"server_version": "1.0.0",
"server_config_hash": "sha256:b3e1...c7a4",
"config_fingerprint": {
"invariants": ["tool_called", "tool_order", "tool_args", "response_shape", "finish_reason"],
"variances": ["text_content", "field_order", "whitespace", "case", "additive_fields"]
},
"assertions": [ ... ],
"invariants": [ ... ]
}
| Field | Type | Description |
|---|---|---|
schema | string | JSON Schema URL the file validates against. |
version | integer | Schema major version. Two files with different version cannot diff. |
model_id | string | Provider-qualified identifier (anthropic:claude-sonnet-4.5). |
model_version | string | Vendor-reported point release. Distinct from model_id for rollouts that pin to a date. |
mcptest_version | string | Version of the binary that produced the capture. |
server_version | string | MCP server version from initialize. |
server_config_hash | string | SHA-256 over the resolved server configuration. Invalidates the baseline when the config changes. |
config_fingerprint | object | Invariant and variance rule names active at capture time. |
assertions | array | Captured exchanges, one per assertion id, sorted by id. |
invariants | array | Optional invariant declarations the diff engine will enforce. |
The five identity fields (model_id, model_version, mcptest_version, server_version, server_config_hash) are the identity tuple. Two captures with the same identity tuple and the same captured exchanges produce a byte-identical file.
assertions[]
Each captured exchange:
{
"id": "tests/triage.yml::find-then-file::step-1",
"content": [
{ "type": "text", "text": "Looking up the account." }
],
"tool_calls": [
{ "name": "lookup_account", "arguments": { "id": "acct-7" } },
{ "name": "create_issue", "arguments": { "repo": "search-svc", "title": "Triage queue overflow" } }
],
"finish_reason": "tool_use",
"mcp_messages": []
}
| Field | Type | Description |
|---|---|---|
id | string | Stable identifier of the form <file>::<test>::<step>. Diff pairs entries on this id, never on positional index. |
content | array | Response content blocks. type: text is the only special-cased variant; other types pass through as Other. |
tool_calls | array | Ordered list of tool calls the model emitted. |
finish_reason | enum | One of stop, tool_use, max_tokens, other. |
mcp_messages | array | Optional raw MCP JSON-RPC envelopes captured for audit. Omitted (or empty) when not relevant. |
invariants[]
Each invariant declaration:
{
"name": "must-call-lookup",
"kind": "tool_called",
"condition": { "tool": "lookup_account" }
}
Five kinds are supported in v1: tool_called, arg_present, response_field_present, response_semantic_match, latency_under_ms. The condition payload shape matches the kind:
| Kind | Condition |
|---|---|
tool_called | { "tool": "<name>" } |
arg_present | { "tool": "<name>", "arg": "<key>" } |
response_field_present | { "text": "<substring>" } |
response_semantic_match | { "text": "<reference>" } |
latency_under_ms | { "ms": <integer> } |
See docs/guides/model-compatibility.md for the user-facing prose introduction and worked examples.
Canonical serialization
Capture writes the file as compact JSON with sorted object keys. A re-capture with the same inputs produces byte-identical output, so a CI check against the captured artifact catches both regressions and accidental in-place edits.
Object key order in arguments and other free-form JSON does not affect equality at the model level (the engine compares values, not strings). Tools that consume the file directly should not depend on physical key ordering inside argument bags.
Backward compatibility
The schema is additive within a version. New optional fields land in v1; type changes and field removals require v2 and the diff engine refuses to compare across version values.
References
docs/guides/model-compatibility.md: prose introduction.tests/fixtures/model-compat/: 18-fixture corpus.