Scenario 5: three-stage CI quality gate
A single "run every test on every push" CI job works until it doesn't. As the suite grows, developers get used to broken pre-push runs and stop reading the output; the build queue gets backed up behind the slow tests; the team loses the feedback loop that made the suite worth writing.
The three-stage pattern uses per-test tags: plus the --tag and --skip-tag flags to run the smallest meaningful set of tests at each stage of the dev loop. The fastest tests run earliest. The expensive tests run least often.
The three stages
| Stage | Tag | Triggered by | Wall budget |
|---|---|---|---|
| Pre-commit smoke | quick | local pre-commit hook, on every commit | under 5 seconds |
| PR standard | standard | every push to a PR branch | under 30 seconds |
| Build full | full | merges to main, nightly | as long as it needs |
quick catches the typos and the structural breakages. standard catches the regressions that a reasonable developer might miss but that a reviewer would. full catches the slow integration paths, the eval tests, and the edge cases that are too expensive to run on every push.
The YAML
Give each test a tags: list naming the stage it belongs to. At each stage, mcptest run --tag <name> keeps only the tests carrying that tag, and --skip-tag <name> drops a tag from a wider run.
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
servers:
local:
command: ["./target/debug/my-mcp-server"]
tools:
- name: "smoke: initialize and list_tools"
server: local
tool: "list_tools"
tags: ["quick"]
expect:
- target: "result.tools"
matcher: { schema: { type: array, minItems: 1 } }
- name: "standard: lookup happy path"
server: local
tool: "lookup_record"
args: { id: "rec_abc123" }
expect:
- target: "result.content[0].text"
matcher: { contains: "rec_abc123" }
- name: "standard: lookup error path"
server: local
tool: "lookup_record"
args: { id: "rec_missing" }
expect:
- target: "result.isError"
matcher: { exact: true }
- name: "full: large dataset under load"
server: local
tool: "stress_test"
tags: ["full"]
args: { records: 10000 }
expect:
assertions:
- target: "result.isError"
matcher: { exact: false }
max_duration_ms: 30000
The CI workflow
# .github/workflows/mcptest.yml
name: mcptest
on:
push:
branches: [main]
pull_request:
jobs:
pr-standard:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: .mcptest-cache/
key: mcptest-${{ runner.os }}-${{ hashFiles('mcptest.yml', 'tests/**') }}
- run: cargo install mcptest --locked
- run: mcptest run --profile standard --cache-dir .mcptest-cache/
build-full:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: .mcptest-cache/
key: mcptest-${{ runner.os }}-${{ hashFiles('mcptest.yml', 'tests/**') }}
- run: cargo install mcptest --locked
- run: mcptest run --profile full --cache-dir .mcptest-cache/
Two jobs, one trigger each. PR pushes hit standard; merges to main hit full. Both jobs share the cache directory so a clean standard run on a PR primes most of the entries the full run will need.
The pre-commit hook
# .git/hooks/pre-commit
#!/usr/bin/env bash
set -euo pipefail
mcptest run --tag quick --no-cache
--no-cache on pre-commit is deliberate. The hook is short enough that the cache lookup is not the bottleneck, and bypassing the cache means the developer sees a real failure instead of a cached pass on a test they just broke and edited locally. Once they push, the PR job benefits from the cache.
If your team uses a managed pre-commit framework, configure it to run the same command.
Expected output
Pre-commit (quick):
$ git commit -m "..."
running mcptest quick smoke...
PASS smoke: initialize and list_tools (87ms)
1 passed, 0 failed in 92ms
PR (standard):
mcptest run --skip-tag full
PASS standard: lookup happy path (cached)
PASS standard: lookup error path (cached)
PASS smoke: initialize and list_tools (87ms)
3 passed, 0 failed in 91ms (2 cache hits)
main merge (full):
mcptest run
PASS smoke: initialize and list_tools (cached)
PASS standard: lookup happy path (cached)
PASS standard: lookup error path (cached)
PASS full: large dataset under load (28.4s)
4 passed, 0 failed in 28.5s (3 cache hits)
Tuning the stages
The three tags are conventions, not law. If your suite has a distinct "eval" cluster that costs money to run, add a fourth tag and trigger it nightly via a schedule: workflow. If your team treats every push as a release candidate, collapse standard and full into one job.
The principle: run the cheapest tests most often, the expensive tests least often, and never let the expensive tests block a developer's fast feedback loop.
See also
docs/guides/ci-integration.md, the cross-vendor CI integration guide.docs/cache.md, the cache details that make this pattern fast.- Previous: Compliance baseline.
- Next: URL target staging.