Setup, teardown, and per-test fixtures

Status: maturing; schema committed. The schema additions documented here land in schemas/v1.json today, so configs that opt in early are valid and will not need editing once the runtime catches up. For shell-style setup and teardown that runs now, use the beforeAll: / afterAll: hooks. The richer setup: / teardown: / setup_per_test: fixture surface on this page is.

Many MCP servers wrap stateful systems (databases, queues, file systems, external APIs). Tests for those servers need to seed state before the suite runs and clean up after. The fixture surface declares that orchestration inside the YAML file so the test definition lives in one place.

At a glance

setup: [SetupStep] runs once before any test in the file.
setup_per_test: [SetupStep] runs before every test.
teardown: [SetupStep] runs once after all tests finish.

Each SetupStep performs one of:

run: [argv]: a shell command, or
call: { tool, args? }: an MCP tool invocation against the file's default server.

Optional knobs per step: background, wait_for, timeout, capture_failure, always (teardown only).

Example 1: database seeding

The server wraps a Postgres-backed issue tracker. Tests need five issues pre-loaded; teardown wipes them.

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  issues:
    url: "http://localhost:8080/mcp"

setup:
  - run: ["./scripts/seed-issues.sh", "5"]
    capture_failure: "Seed script failed; cannot run tests"
    timeout: "30s"

teardown:
  - run: ["./scripts/cleanup-issues.sh"]
    always: true  # runs even if tests fail

tools:
  - name: "list returns seeded issues"
    server: issues
    tool: list_issues
    expect:
      - target: "result.content"
        matcher:
          schema:
            type: array
            minItems: 5

capture_failure: hides the underlying shell error behind an operator-friendly message in the reporter output. always: true on the teardown step guarantees cleanup even when tests fail.

Example 2: background API stub

The server depends on a remote API. Tests start a local stub server, point the MCP server at it, then run.

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  proxy:
    command: ["node", "./mcp-proxy.js"]
    env:
      API_URL: "http://localhost:9999"

setup:
  - run: ["node", "./test/stub-server.js"]
    background: true
    wait_for: "tcp://localhost:9999"
    timeout: "30s"

teardown:
  - run: ["pkill", "-f", "stub-server.js"]
    always: true

tools:
  - name: "fetches stub response"
    server: proxy
    tool: fetch

background: true launches the process without waiting for exit. The runner tracks the PID and cleans it up on exit, but the explicit teardown gives the operator a chance to surface failures from the shutdown. wait_for: polls a probe (TCP URL or HTTP path) until it succeeds before the next step runs.

Example 3: fresh state per test

Some tests must run against a freshly-initialized server. Today that takes manual style: stepwise ceremony; setup_per_test: makes it declarative.

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  workspace:
    command: ["./workspace-mcp"]

setup_per_test:
  - call:
      tool: reset_state
    capture_failure: "reset_state failed; aborting test"

tools:
  - name: "first write succeeds"
    server: workspace
    tool: write_file
    args:
      path: "/tmp/foo"
      contents: "hello"
  - name: "second write goes through"
    server: workspace
    tool: write_file
    args:
      path: "/tmp/foo"
      contents: "world"

Each test sees a freshly reset workspace because setup_per_test: runs before each tool test. This pairs with run_options.restart_policy for process-level isolation.

Failure semantics

The committed failure semantics that the runtime will match:

Setup fails: tests are skipped, exit code 2 (configuration error), teardown still runs.
Test fails: tests fail normally, teardown still runs.
Teardown fails: warning logged, exit code unchanged (we never mask a real test result behind a cleanup glitch).

Field reference

Field	Type	Where	Meaning
`run`	array of strings	step	Argv to spawn. Mutually exclusive with `call`.
`call.tool`	string	step	MCP tool name. Mutually exclusive with `run`.
`call.args`	object	step	JSON-serializable arguments.
`background`	bool	step	Launch without waiting for exit; cleaned up on exit.
`wait_for`	string	step	Readiness probe (TCP URL, HTTP path).
`timeout`	duration string	step	Overall timeout for the step (`30s`).
`capture_failure`	string	step	Reporter-friendly error message on failure.
`always`	bool	step (teardown)	Run even when tests failed.

The schema enforces exactly one of run: and call: per step. Setting both, or neither, raises a deserialization error.

Today: what the runner does

Hook-based setup and teardown (the beforeAll: / afterAll: / beforeEach: / afterEach: blocks, ) work today. The richer setup: / teardown: / setup_per_test: surface on this page parses correctly but the runner currently treats those blocks as a no-op. Tracked for the runtime milestone.

The schema is committed, so YAML you author today against this page remains valid through the future runtime release.

Roadmap

Runtime work still pending:

Subprocess management for run: steps with proper signal handling.
Background-process tracking and shutdown.
Readiness-probe polling for wait_for:.
Cassettes that do not record setup or teardown shell commands (only protocol exchanges).
Reporter integration showing setup and teardown as separate phases.

These are planned for a future release.

References

docs/test-isolation.md, the run_options.restart_policy story that composes with setup_per_test:
Docker compose pattern, for containerized fixtures