Compliance baseline (`compliance-baseline.yml`)

mcptest borrows the expected-failures pattern from the @modelcontextprotocol/conformance suite. A baseline file is a list of compliance rule IDs your server is known to fail today. CI passes when only those rules fail and flips red when anything new fails or when a previously failing rule starts passing without the baseline being updated.

Run this example. examples/compliance-baseline.yml declares the rules a server is known to fail, in both the short and long forms. CI stays green while only those rules fail.

mcptest compliance run --from-suite tests/compliance.yaml --baseline examples/compliance-baseline.yml

This page explains the workflow. The matcher lives in mcptest-core::compliance::baseline; the CLI wiring lands in a follow-up ticket (mcptest compliance --update-baseline is in flight).

Why baselines exist

Compliance suites grow faster than servers do. If you must pass every check on day one, you either delay shipping or vendor in a fork of the suite that quietly skips the failures. Neither is good. The baseline lets you adopt incrementally: declare what you do not pass yet, ship green CI, and chip away at the list. Each removal is a measurable improvement.

The pattern also catches drift. A baselined rule that suddenly passes is either real progress (great, remove it) or a bug somewhere that lets the check return a false positive. CI calling out the stale entry forces a review either way.

When to use a baseline

Use one when:

You are wiring mcptest compliance into CI for the first time and the suite reports failures you cannot fix today.
A new compliance rule lands upstream and you need a grace period.
You are refactoring an area that temporarily regresses a rule (rare, document the timeline in the entry's reason).

Do not use one when:

The server passes every rule. Just delete the file. An empty baseline is a silent invitation for failures to accumulate.
You disagree with a rule. Argue for the rule's removal upstream, do not hide it under your baseline.
A single failure is flaky. Fix the flake; do not paper over it.

File format

compliance-baseline.yml lives next to your mcptest.yml. Two equivalent shapes are accepted in the same list so you can adopt the short form first and grow to the long form when you want metadata.

Long form:

# compliance-baseline.yml
server:
  - rule_id: LC-INIT-007
    reason: server.json field missing in a future release
    tracking: https://github.com/our-org/server/issues/42
  - rule_id: TOOLS-CALL-003
    reason: known race condition
    tracking: https://github.com/our-org/server/issues/421

Short form:

server:
  - LC-INIT-007
  - TOOLS-CALL-003

The short form expands to { rule_id: "...", reason: null, tracking: null }. Mixing is allowed, so you can record metadata for the rules where you have something to say and leave the rest bare.

Editors that read JSON Schema can point at the published schema for inline validation:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json#/$defs/ComplianceBaseline
server:
  - LC-INIT-007

Exit code semantics

The runner walks every rule, decides per rule, then aggregates:

Per-rule decision	Run-level effect
Normal pass	No effect, exit 0.
Expected failure	Counted, exit 0.
New regression	Flips CI red, exit 1.
Stale baseline entry	Flips CI red, exit 1.

Concretely: the run exits 0 only when the set of failing rules matches the baseline exactly. Anything else is a signal worth surfacing.

Regenerating the baseline

A future ticket wires mcptest compliance --update-baseline so the CLI can rewrite the file for you. The flow:

Run mcptest compliance --update-baseline against your server.
The runner replaces compliance-baseline.yml with the current set of failing rules in short form. Existing reason and tracking fields are not preserved (the runner does not know them).
Review the diff. Promote entries to long form by hand and link them to tracker issues.

The library function update_baseline is already in place; the CLI hookup is tracked separately.

References

The @modelcontextprotocol/conformance project, which inspired the pattern. Pin a specific commit when you reproduce its behavior so the contract is stable across upstream changes.

Compliance baseline (compliance-baseline.yml)