CI integration patterns
This guide shows how to run mcptest in continuous integration. It covers three patterns (stdio, HTTP service container, deployed environment) across three platforms (GitHub Actions, GitLab CI, CircleCI), so nine worked examples in total. Every snippet is copy-pasteable. Adjust the version pins, paths, and secret names for your own repository.
Where a flag or feature is still in flight (for example the full --wait-for-ready readiness-polling behavior), the snippet shows the intended call site and notes the status.
The guide assumes you already have at least one passing test file locally. If mcptest run tests/smoke.yaml works on your laptop, the snippets below take it from there.
How to read this guide (decision tree)
Start at the top. The first answer that fits routes you to the right snippet.
┌──────────────────────────────┐
│ How does your MCP server run? │
└──────────────┬───────────────┘
│
┌────────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
Local subprocess HTTP listener Already deployed
(stdio, command, you can boot (staging URL,
one binary) inside the auth in env)
│ CI job │
│ │ │
▼ ▼ ▼
Pattern 1: stdio Pattern 2: HTTP Pattern 3: deployed
service container environment
│ │ │
▼ ▼ ▼
GitHub Actions: GitHub Actions: GitHub Actions:
section 2.1 section 3.1 section 4.1
GitLab CI: GitLab CI: GitLab CI:
section 2.2 section 3.2 section 4.2
CircleCI: CircleCI: CircleCI:
section 2.3 section 3.3 section 4.3
If you want both fast feedback on every commit and one end-to-end run against a real environment, skip to section 5 (combining patterns).
Numbered jump list, in case the ASCII tree above is too cramped:
- The server is a local binary you launch with a command. Go to Pattern 1 (stdio) in section 2.
- The server is an HTTP service. You will start it as a sidecar inside the CI job. Go to Pattern 2 (HTTP service container) in section 3.
- The server is already running somewhere (staging, preview, a VM). Go to Pattern 3 (deployed environment) in section 4.
- You want a tight smoke loop on every push and one slow integration run on pull requests or nightly. Go to section 5 for the combined recipe.
- You want to make any of the above faster. Go to section 6 (caching).
- Something is failing in CI and you cannot reproduce it locally. Go to section 8 (debugging) before guessing.
The 30-second rule: if you cannot find the snippet you need in half a minute, the decision tree is broken. File a docs issue and reference this paragraph.
2. Pattern 1: stdio servers
A stdio server is a binary that speaks MCP over standard input and output. mcptest launches the binary as a child process for the duration of the test run. This is the simplest pattern and usually the fastest, because nothing listens on a port and there is no readiness race.
The test file looks like this:
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
servers:
local:
command: ["./target/release/my-mcp-server"]
tools:
- name: "lists tools without error"
server: local
tool: "list_directory"
args:
path: "/tmp"
expect:
- target: "result.content"
matcher:
schema:
type: array
minItems: 1
The snippets below build the server, then run mcptest. They all:
- pin a specific mcptest version (do not float on
latestin CI), - cache the platform's native artifact store,
- write JUnit output for the platform's test report UI,
- write a Code Quality JSON file when the platform supports it,
- and pass
--wait-for-readyto mcptest. For stdio, ready-detection is cheap, but the flag stays uniform across patterns so a template author does not have to remember which pattern omits it.
2.1 GitHub Actions (stdio)
name: mcptest-stdio
on:
push:
branches: [main]
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
- name: Cache cargo registry and build output
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-cargo-${{ hashFiles('Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-
- name: Build server in release mode
run: cargo build --release --bin my-mcp-server
- name: Install mcptest
run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
- name: Run mcptest
run: |
mcptest run tests/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
- name: Render the JUnit report
if: always()
run: mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
- name: Upload JUnit report
if: always()
uses: actions/upload-artifact@v4
with:
name: mcptest-junit
path: target/mcptest-junit.xml
- name: Publish test summary
if: always()
uses: mikepenz/action-junit-report@v4
with:
report_paths: "target/mcptest-junit.xml"
Notes for this snippet:
--reporterpicks the format (pretty,json,junit,md,html,sarif,gitlab,ndjson,tap, orquiet) and--outputnames the sink (a file, or stdout). Capturingjsononce lets the latermcptest reportstep re-render any format without re-running the suite. To skip the re-render and write a format directly, use--reporter <FORMAT> --output <PATH>, for example--reporter junit --output target/mcptest-junit.xml.actions/cache@v4keys onCargo.lockso a dependency change invalidates cleanly. See section 6 for the pitfall when test fixtures change butCargo.lockdoes not.- The JUnit report writes one
<testsuite>per server and one<testcase>per tool. GitHub's checks UI renders failures inline on the PR. --wait-for-readyfor stdio waits for the server to respond toinitializebefore the first tool call. For a binary that prints banners on startup, this avoids racing the handshake.
2.2 GitLab CI (stdio)
default:
image: rust:1.81
stages:
- build
- test
variables:
CARGO_HOME: "${CI_PROJECT_DIR}/.cargo"
CARGO_TARGET_DIR: "${CI_PROJECT_DIR}/target"
cache:
key:
files:
- Cargo.lock
paths:
- .cargo/registry
- .cargo/git
- target
build-server:
stage: build
script:
- cargo build --release --bin my-mcp-server
artifacts:
paths:
- target/release/my-mcp-server
expire_in: 1 day
mcptest-stdio:
stage: test
needs: ["build-server"]
script:
- curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
- export PATH="$HOME/.local/bin:$PATH"
- mcptest run tests/
--wait-for-ready
--reporter json --output target/mcptest-run.json
--verbose
- mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
- mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
artifacts:
when: always
reports:
junit: target/mcptest-junit.xml
codequality: target/mcptest-codequality.json
paths:
- target/mcptest-junit.xml
- target/mcptest-codequality.json
expire_in: 1 week
Notes:
reports.junitmakes GitLab render per-test results on the merge request.reports.codequalitylights up the Code Quality widget on the MR diff. Thegitlabreport format emits one entry per failing assertion withfingerprint,severity, andlocationfilled in so duplicates collapse across runs.- Capture
jsononce during the run, then re-render JUnit and GitLab Code Quality from the same file withmcptest report. No need to re-run the suite per format. - The shared
cache.key.filesinvalidates the cache whenCargo.lockchanges. Test fixtures live outside the cache key, so see section 6 for the recommended workaround.
2.3 CircleCI (stdio)
version: 2.1
orbs:
rust: circleci/rust@1.6.1
jobs:
mcptest-stdio:
docker:
- image: cimg/rust:1.81
resource_class: medium
steps:
- checkout
- rust/install
- restore_cache:
keys:
- v1-cargo-{{ checksum "Cargo.lock" }}
- v1-cargo-
- run:
name: Build server
command: cargo build --release --bin my-mcp-server
- save_cache:
key: v1-cargo-{{ checksum "Cargo.lock" }}
paths:
- ~/.cargo/registry
- ~/.cargo/git
- target
- run:
name: Install mcptest
command: |
curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
echo 'export PATH="$HOME/.local/bin:$PATH"' >> $BASH_ENV
- run:
name: Run mcptest
command: |
mcptest run tests/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
- run:
name: Render reports
when: always
command: |
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
- store_test_results:
path: target/mcptest-junit.xml
- store_artifacts:
path: target/mcptest-junit.xml
- store_artifacts:
path: target/mcptest-codequality.json
workflows:
test:
jobs:
- mcptest-stdio
Notes:
store_test_resultsis the CircleCI primitive that turns JUnit into the Tests tab.store_artifactskeeps the raw file for download.- CircleCI does not have a first-class Code Quality widget, so the GitLab Code Quality JSON lives as an artifact and is consumed by downstream review bots.
3. Pattern 2: HTTP service container
When the server runs as an HTTP service, the CI job needs to start it alongside the test step. Every major platform has a service-container feature for this. The pattern is always the same:
- Pull (or build) a server image.
- Declare it as a service on the job.
- Point mcptest at the service hostname.
- Use
--wait-for-readyso the test waits for/healthbefore the first tool call.
The test file references the server by URL:
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
servers:
remote:
url: "http://mcp-server:8080/mcp"
tools:
- name: "lists tools without error"
server: remote
tool: "list_directory"
args:
path: "/tmp"
expect:
- target: "result.content"
matcher:
schema:
type: array
minItems: 1
The hostname mcp-server is the service name on each platform's network. On GitHub it is the job-level service name. On GitLab it is the alias. On CircleCI it is the secondary image's network name (default localhost).
3.1 GitHub Actions (HTTP service container)
name: mcptest-http
on:
pull_request:
jobs:
test:
runs-on: ubuntu-latest
services:
mcp-server:
image: ghcr.io/example/my-mcp-server:0.7.3
ports:
- 8080:8080
options: >-
--health-cmd="curl -fsS http://localhost:8080/health || exit 1"
--health-interval=5s
--health-timeout=2s
--health-retries=10
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Cache mcptest install
uses: actions/cache@v4
with:
path: ~/.local/bin/mcptest
key: mcptest-${{ runner.os }}-1.0.0
- name: Install mcptest
run: |
if [ ! -x "$HOME/.local/bin/mcptest" ]; then
curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
fi
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Run mcptest
env:
MCP_SERVER_URL: "http://mcp-server:8080/mcp"
run: |
mcptest run tests/http/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
- name: Render reports
if: always()
run: |
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
- name: Upload reports
if: always()
uses: actions/upload-artifact@v4
with:
name: mcptest-reports
path: target/mcptest-*
Notes:
- The job-level
servicesblock creates a network-attached sidecar reachable athttp://mcp-server:8080from the job steps. GitHub's healthcheck options run before any step starts, so the service is at least listening by the time mcptest launches.--wait-for-readythen handles MCP-level readiness (server responds toinitialize). - The cache key for the mcptest binary is independent of
Cargo.lock. The binary version is the only thing that matters, so the key ismcptest-${{ runner.os }}-1.0.0.
3.2 GitLab CI (HTTP service container)
default:
image: alpine:3.20
stages:
- test
variables:
MCPTEST_VERSION: "1.0.0"
MCP_SERVER_URL: "http://mcp-server:8080/mcp"
mcptest-http:
stage: test
services:
- name: ghcr.io/example/my-mcp-server:0.7.3
alias: mcp-server
command: ["serve", "--port", "8080"]
before_script:
- apk add --no-cache curl bash
- curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION="$MCPTEST_VERSION" sh
- export PATH="$HOME/.local/bin:$PATH"
script:
- mcptest run tests/http/
--wait-for-ready
--reporter json --output mcptest-run.json
--verbose
- mcptest report mcptest-run.json --format junit --output mcptest-junit.xml
- mcptest report mcptest-run.json --format gitlab --output mcptest-codequality.json
artifacts:
when: always
reports:
junit: mcptest-junit.xml
codequality: mcptest-codequality.json
paths:
- mcptest-junit.xml
- mcptest-codequality.json
expire_in: 1 week
cache:
key: "mcptest-${MCPTEST_VERSION}"
paths:
- $HOME/.local/bin/mcptest
Notes:
- The
aliasis the hostname the service is reachable at from the main job container. Use that hostname (notlocalhost) in the test file. - GitLab's runner injects a shared Docker network for each job. The healthcheck-style
--wait-for-readyflag covers the application-layer readiness because GitLab does not run container healthchecks before the job script starts.
3.3 CircleCI (HTTP service container)
version: 2.1
jobs:
mcptest-http:
docker:
- image: cimg/base:stable
- image: ghcr.io/example/my-mcp-server:0.7.3
name: mcp-server
command: ["serve", "--port", "8080"]
resource_class: medium
environment:
MCP_SERVER_URL: "http://localhost:8080/mcp"
steps:
- checkout
- restore_cache:
keys:
- v1-mcptest-1.0.0
- run:
name: Install mcptest
command: |
if [ ! -x "$HOME/.local/bin/mcptest" ]; then
curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
fi
echo 'export PATH="$HOME/.local/bin:$PATH"' >> $BASH_ENV
- save_cache:
key: v1-mcptest-1.0.0
paths:
- ~/.local/bin/mcptest
- run:
name: Run mcptest
command: |
mcptest run tests/http/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
- run:
name: Render reports
when: always
command: |
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
- store_test_results:
path: target/mcptest-junit.xml
- store_artifacts:
path: target/mcptest-junit.xml
- store_artifacts:
path: target/mcptest-codequality.json
workflows:
test:
jobs:
- mcptest-http
Notes:
- On CircleCI, secondary images share the localhost network with the primary. The server is reachable at
localhost:8080, not at the image alias. The test YAML therefore points athttp://localhost:8080/mcp. - The
name:field on a secondary image only affects log labels.
4. Pattern 3: deployed environment
When the server is already running (staging, a preview environment, a long- lived VM), the CI job does not boot anything. It just authenticates and runs tests against the live URL. The pattern is the same on every platform: read the URL and token from environment variables, pass --wait-for-ready so a deploying server has a moment to become healthy, and store the reports.
The test file looks identical to Pattern 2, except the URL points at the deployed environment and includes an auth header:
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
servers:
staging:
url: "${MCP_STAGING_URL}"
headers:
Authorization: "Bearer ${MCP_STAGING_TOKEN}"
tools:
- name: "responds to list_directory in staging"
server: staging
tool: "list_directory"
args:
path: "/tmp"
expect:
- target: "result.content"
matcher:
schema:
type: array
minItems: 1
The two environment variables come from each platform's secrets store. Never embed a token literal in YAML. See section 7 pitfall 2.
4.1 GitHub Actions (deployed environment)
name: mcptest-staging
on:
workflow_dispatch:
schedule:
- cron: "0 6 * * *"
jobs:
test:
runs-on: ubuntu-latest
environment: staging
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Cache mcptest install
uses: actions/cache@v4
with:
path: ~/.local/bin/mcptest
key: mcptest-${{ runner.os }}-1.0.0
- name: Install mcptest
run: |
if [ ! -x "$HOME/.local/bin/mcptest" ]; then
curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
fi
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Run mcptest against staging
env:
MCP_STAGING_URL: ${{ vars.MCP_STAGING_URL }}
MCP_STAGING_TOKEN: ${{ secrets.MCP_STAGING_TOKEN }}
run: |
mcptest run tests/staging/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
- name: Render reports
if: always()
run: |
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
- name: Upload reports
if: always()
uses: actions/upload-artifact@v4
with:
name: mcptest-staging-reports
path: target/mcptest-*
Notes:
- The
environment: stagingline binds the job to a GitHub Environment, so the secret picker reads from the staging-only scope. Production secrets do not leak into a staging job. - A nightly cron plus a manual
workflow_dispatchtrigger is the right default for deployed tests. Running on every PR puts load on staging and couples your PR signal to staging's health, which is rarely what you want.
4.2 GitLab CI (deployed environment)
default:
image: alpine:3.20
stages:
- test
variables:
MCPTEST_VERSION: "1.0.0"
mcptest-staging:
stage: test
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
- if: $CI_PIPELINE_SOURCE == "web"
environment:
name: staging
url: $MCP_STAGING_URL
before_script:
- apk add --no-cache curl bash
- curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION="$MCPTEST_VERSION" sh
- export PATH="$HOME/.local/bin:$PATH"
script:
- mcptest run tests/staging/
--wait-for-ready
--reporter json --output mcptest-run.json
--verbose
- mcptest report mcptest-run.json --format junit --output mcptest-junit.xml
- mcptest report mcptest-run.json --format gitlab --output mcptest-codequality.json
artifacts:
when: always
reports:
junit: mcptest-junit.xml
codequality: mcptest-codequality.json
paths:
- mcptest-junit.xml
- mcptest-codequality.json
expire_in: 1 week
cache:
key: "mcptest-${MCPTEST_VERSION}"
paths:
- $HOME/.local/bin/mcptest
Notes:
environment.name: stagingbinds the job to GitLab's Environments feature so deploy and test runs show up on the same dashboard. The URL link in the UI uses$MCP_STAGING_URL.MCP_STAGING_TOKENis configured in the GitLab project settings under CI/CD variables, scoped to thestagingenvironment, masked, and protected. The runner injects it automatically.
4.3 CircleCI (deployed environment)
version: 2.1
parameters:
staging-only:
type: boolean
default: false
jobs:
mcptest-staging:
docker:
- image: cimg/base:stable
resource_class: small
steps:
- checkout
- restore_cache:
keys:
- v1-mcptest-1.0.0
- run:
name: Install mcptest
command: |
if [ ! -x "$HOME/.local/bin/mcptest" ]; then
curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
fi
echo 'export PATH="$HOME/.local/bin:$PATH"' >> $BASH_ENV
- save_cache:
key: v1-mcptest-1.0.0
paths:
- ~/.local/bin/mcptest
- run:
name: Run mcptest against staging
command: |
mcptest run tests/staging/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
- run:
name: Render reports
when: always
command: |
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
- store_test_results:
path: target/mcptest-junit.xml
- store_artifacts:
path: target/mcptest-junit.xml
- store_artifacts:
path: target/mcptest-codequality.json
workflows:
scheduled:
when:
and:
- equal: [<< pipeline.schedule.name >>, "nightly"]
jobs:
- mcptest-staging:
context: mcptest-staging
Notes:
- The
context: mcptest-stagingline pullsMCP_STAGING_URLandMCP_STAGING_TOKENfrom a CircleCI Context. Contexts are the right scope for environment-specific secrets, because a Context can be restricted to particular workflows and to particular OIDC subjects. - The
parameters+whenblock keeps the job off the per-commit pipeline. Trigger it from a scheduled pipeline namednightlyor with the Trigger Pipeline button in the CircleCI UI.
5. Combining patterns
A common shape is: fast stdio smoke on every push, plus a deployed-env integration run on pull requests or nightly. The smoke run gives a sub-minute red/green signal. The deployed run catches issues that only show up against a real network and real auth.
The example below uses GitHub Actions. The same shape works on the other two platforms with the obvious renames (workflow becomes pipeline, etc.).
name: mcptest
on:
push:
branches: [main]
pull_request:
schedule:
- cron: "0 6 * * *"
jobs:
smoke-stdio:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-cargo-${{ hashFiles('Cargo.lock') }}
- run: cargo build --release --bin my-mcp-server
- run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
- run: |
mcptest run tests/smoke/ \
--wait-for-ready \
--reporter json --output target/mcptest-smoke-run.json \
--verbose
- if: always()
run: mcptest report target/mcptest-smoke-run.json --format junit --output target/mcptest-smoke-junit.xml
- if: always()
uses: actions/upload-artifact@v4
with:
name: smoke-junit
path: target/mcptest-smoke-junit.xml
integration-staging:
if: github.event_name == 'pull_request' || github.event_name == 'schedule'
needs: smoke-stdio
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
- env:
MCP_STAGING_URL: ${{ vars.MCP_STAGING_URL }}
MCP_STAGING_TOKEN: ${{ secrets.MCP_STAGING_TOKEN }}
run: |
mcptest run tests/integration/ \
--wait-for-ready \
--reporter json --output target/mcptest-integration-run.json \
--verbose
- if: always()
run: |
mcptest report target/mcptest-integration-run.json --format junit --output target/mcptest-integration-junit.xml
mcptest report target/mcptest-integration-run.json --format gitlab --output target/mcptest-codequality.json
- if: always()
uses: actions/upload-artifact@v4
with:
name: integration-reports
path: target/mcptest-*
The split has three benefits:
- The smoke job fails fast if the server cannot even start. You do not spend a slot on staging when the binary is broken.
- The integration job is gated by
needs: smoke-stdio, so staging only sees commits that already passed the local-process tests. - Staging-only flakiness no longer blocks every push, because the gating happens on PRs and on the nightly cron, not on every push to a feature branch.
The smoke and integration test sets should not overlap. Put readiness checks, schema-shape assertions, and tool surface coverage in tests/smoke/. Put authentication, network egress, real data, and slower flows in tests/integration/.
6. Caching strategy
Every platform has at least one cache layer. Picking the right one for each pattern is the difference between a 90-second CI run and a 6-minute one.
| Pattern | Best cache | Key | Effect (expected, not yet measured) |
|---|---|---|---|
| Pattern 1 (stdio) | Cargo registry + target/ | Cargo.lock hash | Skips dependency rebuild on every commit. Expected to save the bulk of CI time on a clean Rust project. |
| Pattern 2 (HTTP) | Docker image layer cache + mcptest binary | image tag + mcptest version | Skips image pull, skips re-downloading the mcptest release. |
| Pattern 3 (deployed) | mcptest binary only | mcptest version | Skips the install step. The job is otherwise network-bound. |
The numbers above are deliberately labeled "expected." Measure on your own project before claiming a specific speedup in a release note.
What to put in the cache key
Cargo.lockhash for the Rust build cache. Restoring across versions produces inconsistent builds.- mcptest version for the binary cache. A binary pinned to
1.0.0should cache undermcptest-${runner.os}-1.0.0, not undermcptest-${runner.os}-latest. - Image digest for service-container caches when the platform supports it. Tags can be moved; digests cannot.
What not to put in the cache key
- Branch name. Causes one cache per branch, which defeats the purpose.
- Timestamp. Same problem.
- A wildcard over
**/*.yaml. See pitfall 5 below for the recommended split.
Cross-job restore
GitHub Actions and GitLab CI both restore from any matching key (most recent wins). CircleCI restores in the order listed and stops at the first hit. Order your restore_cache keys from most specific to most general so a hit on the exact Cargo.lock hash wins over a hit on the prefix.
7. Common pitfalls
Six failures show up over and over. Each one has a one-line symptom, a one- line cause, and a fix.
7.1 Missing --wait-for-ready against an HTTP target
Symptom: the first tool call returns connection refused or 404 on the first run after the server image changed, then passes on a retry.
Cause: the service container is listed as a job-level service, but the healthcheck either is not configured or only checks the TCP port, not the MCP initialize handshake. The test runs before the server is fully ready.
Fix: pass --wait-for-ready to every mcptest run that targets an HTTP server. The flag polls the configured readiness probe and gates the first tool call. For platform-level healthchecks, also configure them on the service block so the runner does not even start the step until the container reports healthy.
7.2 Secrets in YAML instead of env vars
Symptom: a test file like
servers:
staging:
url: "https://staging.example.com/mcp"
headers:
Authorization: "Bearer sk_live_abcd1234..."
ends up committed to a public repo, GitHub flags a secret scan alert, and the on-call gets paged.
Cause: tokens were pasted into the YAML instead of interpolated from environment variables.
Fix: always use ${VAR} interpolation and store the token in the platform's secret store (GitHub Secrets, GitLab CI/CD variables, CircleCI Contexts). Rotate any token that has ever appeared in a tracked file. The schema accepts plain strings in the Authorization header for local convenience, but the linter prints a warning when it sees one that looks like a real token. Treat the warning as an error in CI.
7.3 Transport mismatch (cassette recorded against stdio, replayed against URL)
Symptom: cassette replay fails with no matching interaction or method mismatch for requests that look almost identical to the recorded ones.
Cause: a cassette captures the wire-level traffic including the transport. A cassette recorded against a stdio server contains JSON-RPC frames over the stdio framing convention. A cassette recorded against an HTTP server contains HTTP requests with headers. The two are not interchangeable.
Fix: record one cassette per transport, name them as such (fixtures/list_dir.stdio.cassette.json, fixtures/list_dir.http.cassette.json), and reference the matching one in each test file. If your CI runs both patterns, replay against the cassette that matches the transport of the run. The cassette format records the transport in its header so mcptest can refuse to replay across transports.
7.4 Exit-code interpretation
Symptom: a CI step shows green even though tests failed, or red even though all tests passed.
Cause: the shell wrapper around mcptest run swallowed the exit code. Common offenders are bash -c "mcptest run ... | tee log.txt" (uses the exit code of tee, not mcptest) and set +e left over from debugging.
Fix: invoke mcptest directly as the last command of the step, with no pipe. If you must tee, use set -o pipefail first, or use the platform's log capture (every example above pipes nothing). The exit codes mcptest returns are documented in the troubleshooting guide. Treat anything non- zero as a failure unless you have a specific reason not to.
7.5 Cache key not invalidating when tests change
Symptom: a test file change does not change the result, because CI is running an older test set out of the cache.
Cause: the cache key covers source dependencies (e.g., Cargo.lock) but not the test directory. The binary is rebuilt, but the test fixtures are restored from cache and overwrite the new ones.
Fix: do not put test fixtures inside the cached path. Cache target/, the Cargo registry, and the mcptest binary, but not tests/ or examples/. If you must cache derived test artifacts (snapshots, golden files), key the cache on the hash of the source that produced them, for example hashFiles('tests/**').
7.6 Different runner OS surfacing different test results
Symptom: the test suite is green on Ubuntu, red on macOS. The failure is a path comparison or a line-ending mismatch.
Cause: Ubuntu's tmpfs is case-sensitive, macOS's HFS+ default is case- insensitive. Ubuntu line-ends with \n, Windows line-ends with \r\n. A matcher that asserts exact equality on a path or on a stdout buffer will disagree across runners.
Fix: avoid asserting exact equality on values that differ by platform. For a path or a multi-line string, use regex (anchor only the parts you care about, and write \r?\n where a line ending appears) or contains instead of exact. For environment-dependent values (temp dir, hostname), interpolate the actual environment with ${VAR} from the test runtime rather than hard-coding a literal.
8. Debugging failing CI runs
The first step is always the same: read the JUnit output. Every snippet above writes it to a known path and uploads it as an artifact. Open the file locally, find the failing case, copy the symptom, and search the troubleshooting guide.
If that does not resolve it, the steps below escalate in order.
8.1 Re-run with --verbose
Every snippet above already passes --verbose. Pull the job log and search for level=DEBUG. The verbose output includes:
- the resolved server command line or URL,
- every request and response header,
- the readiness wait duration,
- the matcher decision tree for every failed assertion.
If --verbose is not enough, add --debug to a one-off CI run. --debug prints raw wire bytes (with secrets redacted) and is far too noisy for default CI but is the right setting for a forensic run.
8.2 Pull the report artifacts
Every snippet above writes the reports it asks for to target/:
target/mcptest-run.json # --reporter json --output ...
target/mcptest-junit.xml # mcptest report ... --format junit
target/mcptest-codequality.json # mcptest report ... --format gitlab
The JSON run file is the source of truth: it carries the full run envelope, so you can re-render any reporter format from it after the fact without re-running the suite. Every snippet uploads the target/mcptest-* glob, so all three are one click away.
For a forensic run, add --debug to the failing job. --debug prints the resolved config and raw wire bytes (with secrets redacted) to the job log. Capture the JSON run file from a passing run and from the failing run, then diff them. The first divergence usually points at the bug.
8.3 Reproduce locally with the same env vars
The reason CI fails and your laptop passes is almost always the environment. Reproduce by copying the env block out of the workflow:
export MCP_STAGING_URL="https://staging.example.com/mcp"
export MCP_STAGING_TOKEN="$(pass show mcptest/staging)"
export CI=true
export RUST_LOG=mcptest=debug
mcptest run tests/staging/ --wait-for-ready --verbose
Three rules for this loop:
- Match the runner OS. If CI runs on Ubuntu and you run on macOS, use a container:
docker run --rm -it -v "$PWD:/app" -w /app rust:1.81 bash. - Match the mcptest version. If CI is pinned to
1.0.0, install1.0.0locally withcurl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0, sh not whatever Homebrew has. - Match the test file. Do not run
tests/; run the exact subdirectory the failing job runs.
If the local run passes with the same versions, the same OS, and the same env, the next suspect is the network. Run with --debug (or RUST_LOG=mcptest=trace) to log every connection attempt and the raw wire bytes with secrets redacted.
8.4 When to file a bug
Open an issue against the mcptest project when:
- the failure repeats on three consecutive runs and you can attach the JSON run file plus a
--debuglog, - the JUnit output and the verbose log disagree on which assertion failed,
- mcptest itself crashes (non-zero exit code with no report output at all).
For everything else, the troubleshooting guide entry plus the verbose log is usually enough.
Appendix: snippet index
If you got here from another page, this is the shortest path to each worked example.
Pattern 1 (stdio)
- GitHub Actions: section 2.1
- GitLab CI: section 2.2
- CircleCI: section 2.3
Pattern 2 (HTTP service container)
- GitHub Actions: section 3.1
- GitLab CI: section 3.2
- CircleCI: section 3.3
Pattern 3 (deployed environment)
- GitHub Actions: section 4.1
- GitLab CI: section 4.2
- CircleCI: section 4.3
- Combined smoke + integration: section 5
- Caching strategy: section 6
- Common pitfalls: section 7
- Debugging: section 8
Open follow-up items, as of this writing:
- A
soapbucket/mcptest-actionGitHub Action is staged as an example underexamples/mcptest-action/but is not yet published, so do not reference it in a real workflow. Thecurl ... install.shsnippets in this guide do not depend on it and work today. --wait-for-readyis accepted on every subcommand and is referenced by all HTTP and deployed-environment snippets. The flag parses and validates its budget today; the full readiness-polling behavior is still being wired, so until it ships the platform healthcheck is what gates the service.
9. Jenkins
Jenkins is the most common platform in enterprise shops, where a Jenkinsfile already lives next to the repo and the build server is on-prem. The patterns mirror the platforms above: stdio subprocess, HTTP service container, deployed environment. Each pattern fits into both the declarative and the scripted pipeline syntax.
The Jenkinsfile snippets assume:
- A Docker-capable agent (the snippet uses
agent { docker { image '...' } }in declarative form). Air-gapped shops pull from an internal registry; see section 12. - The JUnit publisher plugin (
junitstep) for the test results UI. - The Warnings Next Generation plugin for SARIF surfacing, where relevant. mcptest renders SARIF with
mcptest report --format sarif.
9.1 Declarative Jenkinsfile (stdio)
pipeline {
agent {
docker {
image 'rust:1.81'
args '-v $HOME/.cargo:/root/.cargo'
}
}
environment {
MCPTEST_VERSION = '1.0.0'
PATH = "$HOME/.local/bin:$PATH"
}
stages {
stage('Build server') {
steps {
sh 'cargo build --release --bin my-mcp-server'
}
}
stage('Install mcptest') {
steps {
sh 'curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$MCPTEST_VERSION' sh
}
}
stage('Run mcptest') {
steps {
sh '''
mcptest run tests/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
'''
}
}
}
post {
always {
junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
archiveArtifacts artifacts: 'target/mcptest-*', allowEmptyArchive: true
}
}
}
Notes:
- The
junitpost step publishes results to the Jenkins Tests tab and feeds the build-trends graph.allowEmptyResults: falsefails the build if no results were produced (catches the "mcptest crashed before writing the report" case). archiveArtifactskeeps the JSON run file, the JUnit XML, and the GitLab Code Quality JSON for download. Downstream review bots consume the JSON.
9.2 Declarative Jenkinsfile (HTTP localhost)
pipeline {
agent {
docker {
image 'docker:24'
args '--privileged -v /var/run/docker.sock:/var/run/docker.sock'
}
}
environment {
MCPTEST_VERSION = '1.0.0'
MCP_SERVER_URL = 'http://mcp-server:8080/mcp'
}
stages {
stage('Boot server') {
steps {
sh '''
docker network create mcptest-net || true
docker run -d --rm \
--network mcptest-net \
--name mcp-server \
-p 8080:8080 \
ghcr.io/example/my-mcp-server:0.7.3
'''
}
}
stage('Run mcptest') {
steps {
sh '''
docker run --rm \
--network mcptest-net \
-e MCP_SERVER_URL=$MCP_SERVER_URL \
-v $WORKSPACE:/workspace \
-w /workspace \
--entrypoint sh \
soapbucket/mcptest:$MCPTEST_VERSION -c '
mcptest run tests/http/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
'
'''
}
}
}
post {
always {
sh 'docker stop mcp-server || true'
sh 'docker network rm mcptest-net || true'
junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
archiveArtifacts artifacts: 'target/mcptest-*', allowEmptyArchive: true
}
}
}
Notes:
- The shared Docker network lets the mcptest container reach the server by the alias
mcp-server. The hostname matches what the test YAML expects. - The
post { always }cleanup runs whether the test passes or fails so a flaky job does not leak containers between runs.
9.3 Declarative Jenkinsfile (deployed URL)
pipeline {
agent any
environment {
MCPTEST_VERSION = '1.0.0'
MCP_STAGING_URL = credentials('mcp-staging-url')
MCP_STAGING_TOKEN = credentials('mcp-staging-token')
}
triggers {
cron('H 6 * * *')
}
stages {
stage('Install mcptest') {
steps {
sh 'curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$MCPTEST_VERSION' sh
}
}
stage('Run mcptest against staging') {
steps {
sh '''
mcptest run tests/staging/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
'''
}
}
}
post {
always {
junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
archiveArtifacts artifacts: 'target/mcptest-*', allowEmptyArchive: true
}
}
}
Notes:
credentials('mcp-staging-token')pulls from the Jenkins Credentials store. The credential ID matches the name; in Jenkins, secrets are bound to the job via the Credentials Binding plugin.- The
cron('H 6 * * *')trigger uses Jenkins's hash spreading so multiple jobs scheduled at "6 AM" do not all fire at the same minute. - Per pitfall 7.2, the token never appears in YAML. The
credentials()call masks the value in the job log.
9.4 Scripted Jenkinsfile
For legacy Jenkins installations that still use scripted pipelines, the same stdio pattern looks like this:
node('docker') {
def mcptestVersion = '1.0.0'
docker.image('rust:1.81').inside {
stage('Checkout') {
checkout scm
}
stage('Build server') {
sh 'cargo build --release --bin my-mcp-server'
}
stage('Install mcptest') {
sh "curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=${mcptestVersion}" sh
}
stage('Run mcptest') {
try {
sh """
mcptest run tests/ \\
--wait-for-ready \\
--reporter json --output target/mcptest-run.json \\
--verbose
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
"""
} finally {
junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
archiveArtifacts artifacts: 'target/mcptest-*', allowEmptyArchive: true
}
}
}
}
The try { ... } finally { ... } block is the scripted equivalent of post { always }. It guarantees the JUnit publisher runs even when the test step fails.
9.5 SARIF via Warnings Next Generation
Surface findings in Jenkins through the Warnings Next Generation plugin. Render SARIF from the JSON run file, then add a post step:
stage('Render SARIF') {
steps {
sh 'mcptest report target/mcptest-run.json --format sarif --output target/mcptest.sarif'
}
}
// ... in the post block:
post {
always {
junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
recordIssues(
enabledForFailure: true,
tools: [
sarif(pattern: 'target/mcptest.sarif')
]
)
}
}
The plugin's UI groups findings by rule ID and surfaces them on the build page. Quality-gate rules (fail the build if more than N high-severity findings appear) live in the plugin's configuration, not in the Jenkinsfile.
9.6 Shared library: mcptestStage()
Larger Jenkins shops with many repos converge on a shared library that exposes reusable steps. Once examples/ci-templates/ exists, we will ship a vars/mcptestStage.groovy there. The intended call site:
@Library('soapbucket-shared') _
pipeline {
agent any
stages {
stage('Build') { steps { sh 'cargo build --release --bin my-mcp-server' } }
stage('mcptest') {
steps {
mcptestStage(
version: '1.0.0',
testDir: 'tests/',
formats: ['junit', 'gitlab']
)
}
}
}
}
The helper resolves to the three stages in section 9.1 with the inputs parameterized. It hides the curl install and the post-step plumbing so each consuming pipeline reads as one line.
The shared library source is not yet published; the snippet above is the intended call site for documentation purposes.
10. Buildkite
Buildkite pipelines are YAML files at .buildkite/pipeline.yml. The agent queue routes each step to a matching agent pool, so the same pipeline can run a Rust build on a build agent and a deployed-environment test on a network-egress agent.
10.1 Buildkite (stdio)
steps:
- label: ":rust: Build server"
key: build
agents:
queue: builders
plugins:
- docker#v5.10.0:
image: rust:1.81
mount-checkout: true
environment:
- CARGO_HOME=/workdir/.cargo
commands:
- cargo build --release --bin my-mcp-server
artifact_paths:
- "target/release/my-mcp-server"
- label: ":test_tube: mcptest stdio"
key: mcptest
depends_on: build
agents:
queue: builders
plugins:
- artifacts#v1.9.4:
download: "target/release/my-mcp-server"
- docker#v5.10.0:
image: rust:1.81
mount-checkout: true
commands:
- chmod +x target/release/my-mcp-server
- curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
- export PATH="$HOME/.local/bin:$PATH"
- |
mcptest run tests/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
- mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
- mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
artifact_paths:
- "target/mcptest-*"
Notes:
agents.queue: buildersroutes the step to the builder pool. Use a different queue (agents.queue: egress) for steps that need outbound network access to a staging environment.- The
dockerplugin runs the step inside the named image.mount-checkout: truemounts the working directory, so artifacts written undertarget/end up on the host. - The Buildkite Test Analytics product consumes the JUnit XML when the Test Engine plugin is configured; see section 10.4 for the annotation path that works without the paid product.
10.2 Buildkite (HTTP localhost via docker-compose)
steps:
- label: ":test_tube: mcptest http"
agents:
queue: builders
plugins:
- docker-compose#v5.10.0:
run: mcptest
config: .buildkite/docker-compose.yml
artifact_paths:
- "target/mcptest-*"
With .buildkite/docker-compose.yml:
services:
mcp-server:
image: ghcr.io/example/my-mcp-server:0.7.3
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:8080/health"]
interval: 5s
timeout: 2s
retries: 10
mcptest:
image: soapbucket/mcptest:1.0.0
depends_on:
mcp-server:
condition: service_healthy
environment:
MCP_SERVER_URL: http://mcp-server:8080/mcp
volumes:
- .:/workspace
working_dir: /workspace
entrypoint: ["sh", "-c"]
command:
- |
mcptest run tests/http/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
Notes:
depends_on.mcp-server.condition: service_healthywaits for the healthcheck to pass before mcptest starts. Combined with--wait-for-ready, this covers both TCP-level and MCP-level readiness.- The
docker-composeplugin is the right shape when more than one sidecar is involved (database + server + mcptest, for instance). For a single sidecar, the baredockerplugin with an inline network is lighter.
10.3 Buildkite (deployed URL)
steps:
- label: ":test_tube: mcptest staging"
if: build.source == "schedule" || build.message =~ /\[staging\]/
agents:
queue: egress
plugins:
- docker#v5.10.0:
image: soapbucket/mcptest:1.0.0
entrypoint: sh
environment:
- MCP_STAGING_URL
- MCP_STAGING_TOKEN
mount-checkout: true
commands:
- |
mcptest run tests/staging/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
- mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
- mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
retry:
automatic:
- exit_status: -1
limit: 2
artifact_paths:
- "target/mcptest-*"
Notes:
agents.queue: egressroutes to an agent pool with outbound network access. The builder pool typically blocks egress for supply-chain reasons.- If a worker can legitimately draw an empty test selection (a sharded matrix, for instance), pass
--pass-with-no-testsso an empty run exits0instead of7. mcptest does not emit a separate "environment unavailable" exit code; the stable set is0,1,2,5,6,7. retry.automatic.exit_status: -1retries on infrastructure failures (network drop, agent loss) up to twice. It does not retry on test failures, only on operational ones.MCP_STAGING_URLandMCP_STAGING_TOKENcome from Buildkite's secret store, attached to the pipeline as environment variables. The Docker plugin'senvironmentlist passes them through without ever exposing them in the YAML.
10.4 Annotate the build with JUnit summary
Buildkite's annotation API attaches Markdown to the build page. To surface mcptest results inline (without paying for Test Analytics):
- label: ":memo: Annotate mcptest results"
depends_on: mcptest
allow_dependency_failure: true
agents:
queue: builders
commands:
- buildkite-agent artifact download "target/mcptest-junit.xml" .
- |
if grep -q 'failures="0"' target/mcptest-junit.xml; then
buildkite-agent annotate --style success "mcptest passed."
else
FAIL_COUNT=$(grep -oE 'failures="[0-9]+"' target/mcptest-junit.xml | head -n1 | grep -oE '[0-9]+')
buildkite-agent annotate --style error "mcptest failed ($FAIL_COUNT test(s)). See artifacts."
fi
allow_dependency_failure: true runs the annotation step even when mcptest exited non-zero, so failed builds still get the inline summary.
10.5 Agent queue routing
Three queues are typically enough:
| Queue | Used for |
|---|---|
builders | Compile-heavy work (cargo build, npm install). |
egress | Steps that need outbound network to staging or prod URLs. |
mcptest | Steps that need the mcptest binary preinstalled. |
The mcptest queue is optional; the snippets above install mcptest into the step on demand. A dedicated queue saves the install step on every run at the cost of an extra agent pool to maintain. For low-volume pipelines the install-on-demand pattern is simpler.
11. Azure DevOps
Azure DevOps pipelines live at azure-pipelines.yml. The platform's test-results UI consumes JUnit through PublishTestResults@2 and the SARIF surface through PublishCodeAnalysisResults@1.
11.1 Azure DevOps (stdio)
trigger:
branches:
include: [main]
pool:
vmImage: ubuntu-latest
variables:
MCPTEST_VERSION: 1.0.0
RUST_VERSION: 1.81
steps:
- checkout: self
- task: Cache@2
inputs:
key: 'cargo | "$(Agent.OS)" | Cargo.lock'
path: |
$(HOME)/.cargo
target
restoreKeys: |
cargo | "$(Agent.OS)"
- script: |
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain $(RUST_VERSION)
echo "##vso[task.prependpath]$HOME/.cargo/bin"
displayName: Install Rust toolchain
- script: cargo build --release --bin my-mcp-server
displayName: Build server
- script: |
curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$(MCPTEST_VERSION) sh
echo "##vso[task.prependpath]$HOME/.local/bin"
displayName: Install mcptest
- script: |
mcptest run tests/ \
--wait-for-ready \
--reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
--verbose
mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format junit --output $(Build.ArtifactStagingDirectory)/mcptest-junit.xml
mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format gitlab --output $(Build.ArtifactStagingDirectory)/mcptest-codequality.json
displayName: Run mcptest
- task: PublishTestResults@2
condition: succeededOrFailed()
inputs:
testRunner: JUnit
testResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest-junit.xml"
testRunTitle: mcptest stdio
failTaskOnFailedTests: true
- task: PublishBuildArtifacts@1
condition: succeededOrFailed()
inputs:
pathToPublish: $(Build.ArtifactStagingDirectory)
artifactName: mcptest-reports
Notes:
condition: succeededOrFailed()is the Azure equivalent ofif: always(). Without it, the publish step is skipped on test failure and the UI loses the report.failTaskOnFailedTests: truemakes thePublishTestResultstask fail the pipeline when JUnit reports failures. Without this flag the test results show up in the UI but the pipeline reports success.##vso[task.prependpath]...is Azure's logging-command form for modifyingPATHfor subsequent steps.
11.2 Azure DevOps (HTTP localhost via container resource)
resources:
containers:
- container: mcp-server
image: ghcr.io/example/my-mcp-server:0.7.3
ports:
- 8080:8080
services:
mcp-server: mcp-server
variables:
MCPTEST_VERSION: 1.0.0
MCP_SERVER_URL: http://mcp-server:8080/mcp
pool:
vmImage: ubuntu-latest
steps:
- checkout: self
- script: |
curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$(MCPTEST_VERSION) sh
echo "##vso[task.prependpath]$HOME/.local/bin"
displayName: Install mcptest
- script: |
mcptest run tests/http/ \
--wait-for-ready \
--reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
--verbose
mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format junit --output $(Build.ArtifactStagingDirectory)/mcptest-junit.xml
mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format gitlab --output $(Build.ArtifactStagingDirectory)/mcptest-codequality.json
displayName: Run mcptest
- task: PublishTestResults@2
condition: succeededOrFailed()
inputs:
testRunner: JUnit
testResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest-junit.xml"
testRunTitle: mcptest http
failTaskOnFailedTests: true
Notes:
- The
resources.containersblock declares the image; theservicesblock attaches it to the job network. The container is reachable atmcp-server:8080from the job steps. For images in a private registry, add a service connection (Pipelines
Service connections > Docker Registry) and reference it in the
container resource via
endpoint:.
11.3 Azure DevOps (deployed URL with service connection)
schedules:
- cron: "0 6 * * *"
displayName: Nightly staging tests
branches:
include: [main]
always: true
pool:
vmImage: ubuntu-latest
variables:
MCPTEST_VERSION: 1.0.0
- group: mcptest-staging # Variable Group with MCP_STAGING_URL, MCP_STAGING_TOKEN
steps:
- checkout: self
- script: |
curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$(MCPTEST_VERSION) sh
echo "##vso[task.prependpath]$HOME/.local/bin"
displayName: Install mcptest
- script: |
mcptest run tests/staging/ \
--wait-for-ready \
--reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
--verbose
mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format junit --output $(Build.ArtifactStagingDirectory)/mcptest-junit.xml
mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format gitlab --output $(Build.ArtifactStagingDirectory)/mcptest-codequality.json
displayName: Run mcptest against staging
env:
MCP_STAGING_URL: $(MCP_STAGING_URL)
MCP_STAGING_TOKEN: $(MCP_STAGING_TOKEN)
- task: PublishTestResults@2
condition: succeededOrFailed()
inputs:
testRunner: JUnit
testResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest-junit.xml"
testRunTitle: mcptest staging
failTaskOnFailedTests: true
Notes:
- The Variable Group
mcptest-stagingis defined in Library and bound to the pipeline. Secrets in a Variable Group are encrypted at rest and injected into the job environment. - For OAuth-authenticated MCP servers, add a Generic Service Connection (Service Connections > New > Generic), reference it with
serviceConnection: 'mcp-staging-oauth', and pull tokens with theOAuth1orBearer Tokenconnection types.
11.4 SARIF via PublishCodeAnalysisResults
Render SARIF from the JSON run file, then publish it:
- script: |
mcptest run tests/ \
--reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
--verbose
mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format sarif --output $(Build.ArtifactStagingDirectory)/mcptest.sarif
mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format junit --output $(Build.ArtifactStagingDirectory)/mcptest-junit.xml
displayName: Run mcptest
- task: PublishCodeAnalysisResults@1
condition: succeededOrFailed()
inputs:
codeAnalysisResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest.sarif"
codeAnalysisResultsType: SARIF
11.5 YAML templates for reuse
For orgs with many repositories, factor the mcptest steps into a YAML template. Add a mcptest.yml template at the org's shared-templates repo:
# templates/mcptest.yml
parameters:
- name: testDir
type: string
default: tests/
- name: mcptestVersion
type: string
default: 1.0.0
- name: formats
type: object
default:
- junit
- gitlab
steps:
- script: |
curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=${{ sh parameters.mcptestVersion }}
echo "##vso[task.prependpath]$HOME/.local/bin"
displayName: Install mcptest
- script: |
mcptest run ${{ parameters.testDir }} \
--wait-for-ready \
--reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
--verbose
${{ each f in parameters.formats }}:
mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format ${{ f }} --output $(Build.ArtifactStagingDirectory)/mcptest-${{ f }}-report
displayName: Run mcptest
- task: PublishTestResults@2
condition: succeededOrFailed()
inputs:
testRunner: JUnit
testResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest-junit-report"
failTaskOnFailedTests: true
Consumed from a downstream pipeline:
resources:
repositories:
- repository: templates
type: git
name: shared/templates
ref: refs/tags/v1.0.0
steps:
- template: mcptest.yml@templates
parameters:
testDir: tests/integration/
mcptestVersion: 1.0.0
Pin the template repo to a tag, not to main. A template change silently rolls out to every consuming pipeline if the consumer points at a branch.
12. Self-hosted and air-gapped environments
Enterprise installations often run CI on isolated networks with no outbound HTTP. Every artifact (Docker image, mcptest binary, SARIF schema) has to live inside the perimeter. The snippets below adapt the patterns above for that environment.
The shape is the same on every CI platform; the difference is sourcing.
12.1 Offline install: docker save and tarball
For mcptest itself, save the Docker image on an internet-connected host and ship the tarball through the same channel you use for other controlled artifacts (S3 with bucket policies, an internal artifact store, sneakernet via removable media for the strictest shops):
# On an internet-connected host
docker pull soapbucket/mcptest:1.0.0
docker save soapbucket/mcptest:1.0.0 -o mcptest-1.0.0.tar
# Compute a checksum the receiving side can verify
sha256sum mcptest-1.0.0.tar > mcptest-1.0.0.tar.sha256
# On the air-gapped CI agent
sha256sum -c mcptest-1.0.0.tar.sha256
docker load -i mcptest-1.0.0.tar
docker tag soapbucket/mcptest:1.0.0 internal-registry.example.com/mcptest:1.0.0
docker push internal-registry.example.com/mcptest:1.0.0
For the standalone binary, mirror the GitHub release artifact to an internal artifact store and adjust the install command:
# Replaces the curl https://download.mcptest.sh/install.sh path
INTERNAL_BASE="https://artifacts.example.com/mcptest/1.0.0"
curl -fsSL "$INTERNAL_BASE/mcptest-linux-x86_64.tar.gz" -o mcptest.tar.gz
sha256sum -c <(echo "$(curl -fsSL "$INTERNAL_BASE/SHA256SUMS")")
tar -xzf mcptest.tar.gz
sudo install mcptest /usr/local/bin/
The official install script (install.sh) accepts an MCPTEST_DOWNLOAD_BASE environment variable that points at the internal mirror. The install flow is otherwise identical.
12.2 Internal registry mirroring
For server images, the same docker save / docker load pattern applies. Most enterprise registries (Harbor, Artifactory, Nexus, ECR behind a private endpoint) accept the saved tarball directly:
docker pull ghcr.io/example/my-mcp-server:0.7.3
docker save ghcr.io/example/my-mcp-server:0.7.3 -o my-mcp-server-0.7.3.tar
# Transfer through the controlled channel, then on the air-gapped side:
docker load -i my-mcp-server-0.7.3.tar
docker tag ghcr.io/example/my-mcp-server:0.7.3 \
internal-registry.example.com/mcptest/my-mcp-server:0.7.3
docker push internal-registry.example.com/mcptest/my-mcp-server:0.7.3
Update the CI snippets to reference the internal registry hostname:
# GitHub Actions / Jenkins / Buildkite / Azure pattern
services:
mcp-server:
image: internal-registry.example.com/mcptest/my-mcp-server:0.7.3
The image digest is more robust than the tag, because tags are mutable and registries do not always enforce immutability:
services:
mcp-server:
image: internal-registry.example.com/mcptest/my-mcp-server@sha256:abcdef...
12.3 No outbound HTTP
mcptest's default behavior already aligns with air-gapped environments:
- No update check. mcptest never calls home to check for a newer version. There is no update-check flag to set; the absence of an update check is the contract.
- No telemetry. The OSS build emits no telemetry. There is no
--no-telemetryflag to set; the absence of telemetry is the contract. - No live cassette pulls. Cassettes live next to the test files in the repo. mcptest never fetches a cassette from a remote URL at test time.
- Schema URLs are advisory. The JSON Schema URL (
https://mcptest.sh/schema/v1.json) appears in YAML files as ayaml-language-server:hint for editor tooling. The runner does not fetch the schema at runtime; it ships embedded in the binary.
If your CI agent enforces a strict deny-by-default egress policy, the only outbound calls a normal mcptest run makes are to the deployed MCP server URL (in Pattern 3) or to nothing at all (in Patterns 1 and 2, which run against local processes or sidecars).
12.4 HTTPS_PROXY and HTTP_PROXY support
For environments where outbound HTTP is allowed only through a corporate proxy, mcptest's HTTP transport honors the standard environment variables:
| Variable | Effect |
|---|---|
HTTPS_PROXY | Routes HTTPS requests through the named proxy. |
HTTP_PROXY | Routes HTTP requests through the named proxy. |
NO_PROXY | Comma-separated list of hostnames to bypass. |
Example for a deployed-URL pattern behind a corporate proxy:
# GitHub Actions / Jenkins / Buildkite / Azure pattern
env:
HTTPS_PROXY: http://proxy.example.com:3128
NO_PROXY: localhost,127.0.0.1,internal-registry.example.com
MCP_STAGING_URL: https://staging.example.com/mcp
MCP_STAGING_TOKEN: ${{ secrets.MCP_STAGING_TOKEN }}
The reqwest-based HTTP transport reads these variables at startup. mcptest also exposes explicit proxy flags (--proxy, --http-proxy, --https-proxy, --no-proxy, and --noproxy HOSTLIST) that override the environment when a single run needs different routing.
12.5 Internal certificate authorities
When the deployed environment uses a private certificate authority, mount the CA bundle into the agent's trust store. The standard Linux path is /etc/ssl/certs/ca-certificates.crt. mcptest's HTTP client respects SSL_CERT_FILE and SSL_CERT_DIR, so the simplest path is:
env:
SSL_CERT_FILE: /etc/internal-ca/bundle.pem
For the Docker image, bake the CA bundle into the base image:
FROM soapbucket/mcptest:1.0.0
COPY internal-ca.pem /usr/local/share/ca-certificates/internal-ca.crt
RUN update-ca-certificates
Republish the resulting image to the internal registry and use it in place of soapbucket/mcptest:1.0.0.
13. TeamCity (stub)
Full TeamCity integration is not yet documented. For now, TeamCity is supported via the generic Docker image and the standard JUnit publisher.
A minimal TeamCity build step (Command Line runner):
docker run --rm \
-v %teamcity.build.checkoutDir%:/workspace \
-w /workspace \
--entrypoint sh \
soapbucket/mcptest:1.0.0 -c '
mcptest run tests/ \
--wait-for-ready \
--reporter json --output target/mcptest-run.json \
--verbose
mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
'
Then add an XML Report Processing build feature with:
- Report type:
Ant JUnit - Monitoring rules:
target/mcptest-junit.xml
The TeamCity tests tab consumes the JUnit report and surfaces failures on the build page. The GitLab Code Quality JSON is archived as an artifact through the Artifact Paths setting on the build configuration.
What is missing from this stub:
- A dedicated TeamCity meta-runner (a reusable build step type with parameter prompts) for mcptest.
- First-class support for the SARIF surface (TeamCity has its own inspection model that does not map cleanly to SARIF).
- A Kotlin DSL representation in
.teamcity/settings.ktsfor shops that version-control their build configurations.
These items land in v1.2 if there is demand. The current Docker + JUnit path is enough to run mcptest in a TeamCity pipeline today.
14. Reusable templates
Reusable templates under examples/ci-templates/ are planned but not yet published. Once that directory exists, it will ship:
examples/ci-templates/jenkins/Jenkinsfile.{stdio,http,deployed}examples/ci-templates/jenkins/vars/mcptestStage.groovyexamples/ci-templates/buildkite/pipeline.{stdio,http,deployed}.ymlexamples/ci-templates/buildkite/docker-compose.ymlexamples/ci-templates/azure-devops/azure-pipelines.{stdio,http,deployed}.ymlexamples/ci-templates/azure-devops/templates/mcptest.ymlexamples/ci-templates/teamcity/build.cmd(the stub above)
Until the directory exists, treat the snippets in this guide as the canonical source.