CI integration patterns

This guide shows how to run mcptest in continuous integration. It covers three patterns (stdio, HTTP service container, deployed environment) across three platforms (GitHub Actions, GitLab CI, CircleCI), so nine worked examples in total. Every snippet is copy-pasteable. Adjust the version pins, paths, and secret names for your own repository.

Where a flag or feature is still in flight (for example the full --wait-for-ready readiness-polling behavior), the snippet shows the intended call site and notes the status.

The guide assumes you already have at least one passing test file locally. If mcptest run tests/smoke.yaml works on your laptop, the snippets below take it from there.

How to read this guide (decision tree)

Start at the top. The first answer that fits routes you to the right snippet.

                  ┌──────────────────────────────┐
                  │ How does your MCP server run? │
                  └──────────────┬───────────────┘
                                 │
        ┌────────────────────────┼────────────────────────┐
        │                        │                        │
        ▼                        ▼                        ▼
  Local subprocess          HTTP listener          Already deployed
  (stdio, command,           you can boot           (staging URL,
   one binary)                inside the              auth in env)
        │                     CI job                       │
        │                        │                         │
        ▼                        ▼                         ▼
  Pattern 1: stdio        Pattern 2: HTTP            Pattern 3: deployed
                          service container          environment
        │                        │                         │
        ▼                        ▼                         ▼
  GitHub Actions:         GitHub Actions:           GitHub Actions:
    section 2.1             section 3.1               section 4.1
  GitLab CI:              GitLab CI:                GitLab CI:
    section 2.2             section 3.2               section 4.2
  CircleCI:               CircleCI:                 CircleCI:
    section 2.3             section 3.3               section 4.3

If you want both fast feedback on every commit and one end-to-end run against a real environment, skip to section 5 (combining patterns).

Numbered jump list, in case the ASCII tree above is too cramped:

The server is a local binary you launch with a command. Go to Pattern 1 (stdio) in section 2.
The server is an HTTP service. You will start it as a sidecar inside the CI job. Go to Pattern 2 (HTTP service container) in section 3.
The server is already running somewhere (staging, preview, a VM). Go to Pattern 3 (deployed environment) in section 4.
You want a tight smoke loop on every push and one slow integration run on pull requests or nightly. Go to section 5 for the combined recipe.
You want to make any of the above faster. Go to section 6 (caching).
Something is failing in CI and you cannot reproduce it locally. Go to section 8 (debugging) before guessing.

The 30-second rule: if you cannot find the snippet you need in half a minute, the decision tree is broken. File a docs issue and reference this paragraph.

2. Pattern 1: stdio servers

A stdio server is a binary that speaks MCP over standard input and output. mcptest launches the binary as a child process for the duration of the test run. This is the simplest pattern and usually the fastest, because nothing listens on a port and there is no readiness race.

The test file looks like this:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  local:
    command: ["./target/release/my-mcp-server"]

tools:
  - name: "lists tools without error"
    server: local
    tool: "list_directory"
    args:
      path: "/tmp"
    expect:
      - target: "result.content"
        matcher:
          schema:
            type: array
            minItems: 1

The snippets below build the server, then run mcptest. They all:

pin a specific mcptest version (do not float on latest in CI),
cache the platform's native artifact store,
write JUnit output for the platform's test report UI,
write a Code Quality JSON file when the platform supports it,
and pass --wait-for-ready to mcptest. For stdio, ready-detection is cheap, but the flag stays uniform across patterns so a template author does not have to remember which pattern omits it.

2.1 GitHub Actions (stdio)

name: mcptest-stdio
on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Check out code
        uses: actions/checkout@v4

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable

      - name: Cache cargo registry and build output
        uses: actions/cache@v4
        with:
          path: |
            ~/.cargo/registry
            ~/.cargo/git
            target
          key: ${{ runner.os }}-cargo-${{ hashFiles('Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-

      - name: Build server in release mode
        run: cargo build --release --bin my-mcp-server

      - name: Install mcptest
        run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh

      - name: Run mcptest
        run: |
          mcptest run tests/ \
            --wait-for-ready \
            --reporter json --output target/mcptest-run.json \
            --verbose

      - name: Render the JUnit report
        if: always()
        run: mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml

      - name: Upload JUnit report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: mcptest-junit
          path: target/mcptest-junit.xml

      - name: Publish test summary
        if: always()
        uses: mikepenz/action-junit-report@v4
        with:
          report_paths: "target/mcptest-junit.xml"

Notes for this snippet:

--reporter picks the format (pretty, json, junit, md, html, sarif, gitlab, ndjson, tap, or quiet) and --output names the sink (a file, or stdout). Capturing json once lets the later mcptest report step re-render any format without re-running the suite. To skip the re-render and write a format directly, use --reporter <FORMAT> --output <PATH>, for example --reporter junit --output target/mcptest-junit.xml.
actions/cache@v4 keys on Cargo.lock so a dependency change invalidates cleanly. See section 6 for the pitfall when test fixtures change but Cargo.lock does not.
The JUnit report writes one <testsuite> per server and one <testcase> per tool. GitHub's checks UI renders failures inline on the PR.
--wait-for-ready for stdio waits for the server to respond to initialize before the first tool call. For a binary that prints banners on startup, this avoids racing the handshake.

2.2 GitLab CI (stdio)

default:
  image: rust:1.81

stages:
  - build
  - test

variables:
  CARGO_HOME: "${CI_PROJECT_DIR}/.cargo"
  CARGO_TARGET_DIR: "${CI_PROJECT_DIR}/target"

cache:
  key:
    files:
      - Cargo.lock
  paths:
    - .cargo/registry
    - .cargo/git
    - target

build-server:
  stage: build
  script:
    - cargo build --release --bin my-mcp-server
  artifacts:
    paths:
      - target/release/my-mcp-server
    expire_in: 1 day

mcptest-stdio:
  stage: test
  needs: ["build-server"]
  script:
    - curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
    - export PATH="$HOME/.local/bin:$PATH"
    - mcptest run tests/
        --wait-for-ready
        --reporter json --output target/mcptest-run.json
        --verbose
    - mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
    - mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
  artifacts:
    when: always
    reports:
      junit: target/mcptest-junit.xml
      codequality: target/mcptest-codequality.json
    paths:
      - target/mcptest-junit.xml
      - target/mcptest-codequality.json
    expire_in: 1 week

Notes:

reports.junit makes GitLab render per-test results on the merge request.
reports.codequality lights up the Code Quality widget on the MR diff. The gitlab report format emits one entry per failing assertion with fingerprint, severity, and location filled in so duplicates collapse across runs.
Capture json once during the run, then re-render JUnit and GitLab Code Quality from the same file with mcptest report. No need to re-run the suite per format.
The shared cache.key.files invalidates the cache when Cargo.lock changes. Test fixtures live outside the cache key, so see section 6 for the recommended workaround.

2.3 CircleCI (stdio)

version: 2.1

orbs:
  rust: circleci/rust@1.6.1

jobs:
  mcptest-stdio:
    docker:
      - image: cimg/rust:1.81
    resource_class: medium
    steps:
      - checkout
      - rust/install
      - restore_cache:
          keys:
            - v1-cargo-{{ checksum "Cargo.lock" }}
            - v1-cargo-
      - run:
          name: Build server
          command: cargo build --release --bin my-mcp-server
      - save_cache:
          key: v1-cargo-{{ checksum "Cargo.lock" }}
          paths:
            - ~/.cargo/registry
            - ~/.cargo/git
            - target
      - run:
          name: Install mcptest
          command: |
            curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
            echo 'export PATH="$HOME/.local/bin:$PATH"' >> $BASH_ENV
      - run:
          name: Run mcptest
          command: |
            mcptest run tests/ \
              --wait-for-ready \
              --reporter json --output target/mcptest-run.json \
              --verbose
      - run:
          name: Render reports
          when: always
          command: |
            mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
            mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
      - store_test_results:
          path: target/mcptest-junit.xml
      - store_artifacts:
          path: target/mcptest-junit.xml
      - store_artifacts:
          path: target/mcptest-codequality.json

workflows:
  test:
    jobs:
      - mcptest-stdio

Notes:

store_test_results is the CircleCI primitive that turns JUnit into the Tests tab. store_artifacts keeps the raw file for download.
CircleCI does not have a first-class Code Quality widget, so the GitLab Code Quality JSON lives as an artifact and is consumed by downstream review bots.

3. Pattern 2: HTTP service container

When the server runs as an HTTP service, the CI job needs to start it alongside the test step. Every major platform has a service-container feature for this. The pattern is always the same:

Pull (or build) a server image.
Declare it as a service on the job.
Point mcptest at the service hostname.
Use --wait-for-ready so the test waits for /health before the first tool call.

The test file references the server by URL:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  remote:
    url: "http://mcp-server:8080/mcp"

tools:
  - name: "lists tools without error"
    server: remote
    tool: "list_directory"
    args:
      path: "/tmp"
    expect:
      - target: "result.content"
        matcher:
          schema:
            type: array
            minItems: 1

The hostname mcp-server is the service name on each platform's network. On GitHub it is the job-level service name. On GitLab it is the alias. On CircleCI it is the secondary image's network name (default localhost).

3.1 GitHub Actions (HTTP service container)

name: mcptest-http
on:
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest

    services:
      mcp-server:
        image: ghcr.io/example/my-mcp-server:0.7.3
        ports:
          - 8080:8080
        options: >-
          --health-cmd="curl -fsS http://localhost:8080/health || exit 1"
          --health-interval=5s
          --health-timeout=2s
          --health-retries=10

    steps:
      - name: Check out code
        uses: actions/checkout@v4

      - name: Cache mcptest install
        uses: actions/cache@v4
        with:
          path: ~/.local/bin/mcptest
          key: mcptest-${{ runner.os }}-1.0.0

      - name: Install mcptest
        run: |
          if [ ! -x "$HOME/.local/bin/mcptest" ]; then
            curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
          fi
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Run mcptest
        env:
          MCP_SERVER_URL: "http://mcp-server:8080/mcp"
        run: |
          mcptest run tests/http/ \
            --wait-for-ready \
            --reporter json --output target/mcptest-run.json \
            --verbose

      - name: Render reports
        if: always()
        run: |
          mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
          mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json

      - name: Upload reports
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: mcptest-reports
          path: target/mcptest-*

Notes:

The job-level services block creates a network-attached sidecar reachable at http://mcp-server:8080 from the job steps. GitHub's healthcheck options run before any step starts, so the service is at least listening by the time mcptest launches. --wait-for-ready then handles MCP-level readiness (server responds to initialize).
The cache key for the mcptest binary is independent of Cargo.lock. The binary version is the only thing that matters, so the key is mcptest-${{ runner.os }}-1.0.0.

3.2 GitLab CI (HTTP service container)

default:
  image: alpine:3.20

stages:
  - test

variables:
  MCPTEST_VERSION: "1.0.0"
  MCP_SERVER_URL: "http://mcp-server:8080/mcp"

mcptest-http:
  stage: test
  services:
    - name: ghcr.io/example/my-mcp-server:0.7.3
      alias: mcp-server
      command: ["serve", "--port", "8080"]
  before_script:
    - apk add --no-cache curl bash
    - curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION="$MCPTEST_VERSION" sh
    - export PATH="$HOME/.local/bin:$PATH"
  script:
    - mcptest run tests/http/
        --wait-for-ready
        --reporter json --output mcptest-run.json
        --verbose
    - mcptest report mcptest-run.json --format junit --output mcptest-junit.xml
    - mcptest report mcptest-run.json --format gitlab --output mcptest-codequality.json
  artifacts:
    when: always
    reports:
      junit: mcptest-junit.xml
      codequality: mcptest-codequality.json
    paths:
      - mcptest-junit.xml
      - mcptest-codequality.json
    expire_in: 1 week
  cache:
    key: "mcptest-${MCPTEST_VERSION}"
    paths:
      - $HOME/.local/bin/mcptest

Notes:

The alias is the hostname the service is reachable at from the main job container. Use that hostname (not localhost) in the test file.
GitLab's runner injects a shared Docker network for each job. The healthcheck-style --wait-for-ready flag covers the application-layer readiness because GitLab does not run container healthchecks before the job script starts.

3.3 CircleCI (HTTP service container)

version: 2.1

jobs:
  mcptest-http:
    docker:
      - image: cimg/base:stable
      - image: ghcr.io/example/my-mcp-server:0.7.3
        name: mcp-server
        command: ["serve", "--port", "8080"]
    resource_class: medium
    environment:
      MCP_SERVER_URL: "http://localhost:8080/mcp"
    steps:
      - checkout
      - restore_cache:
          keys:
            - v1-mcptest-1.0.0
      - run:
          name: Install mcptest
          command: |
            if [ ! -x "$HOME/.local/bin/mcptest" ]; then
              curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
            fi
            echo 'export PATH="$HOME/.local/bin:$PATH"' >> $BASH_ENV
      - save_cache:
          key: v1-mcptest-1.0.0
          paths:
            - ~/.local/bin/mcptest
      - run:
          name: Run mcptest
          command: |
            mcptest run tests/http/ \
              --wait-for-ready \
              --reporter json --output target/mcptest-run.json \
              --verbose
      - run:
          name: Render reports
          when: always
          command: |
            mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
            mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
      - store_test_results:
          path: target/mcptest-junit.xml
      - store_artifacts:
          path: target/mcptest-junit.xml
      - store_artifacts:
          path: target/mcptest-codequality.json

workflows:
  test:
    jobs:
      - mcptest-http

Notes:

On CircleCI, secondary images share the localhost network with the primary. The server is reachable at localhost:8080, not at the image alias. The test YAML therefore points at http://localhost:8080/mcp.
The name: field on a secondary image only affects log labels.

4. Pattern 3: deployed environment

When the server is already running (staging, a preview environment, a long- lived VM), the CI job does not boot anything. It just authenticates and runs tests against the live URL. The pattern is the same on every platform: read the URL and token from environment variables, pass --wait-for-ready so a deploying server has a moment to become healthy, and store the reports.

The test file looks identical to Pattern 2, except the URL points at the deployed environment and includes an auth header:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  staging:
    url: "${MCP_STAGING_URL}"
    headers:
      Authorization: "Bearer ${MCP_STAGING_TOKEN}"

tools:
  - name: "responds to list_directory in staging"
    server: staging
    tool: "list_directory"
    args:
      path: "/tmp"
    expect:
      - target: "result.content"
        matcher:
          schema:
            type: array
            minItems: 1

The two environment variables come from each platform's secrets store. Never embed a token literal in YAML. See section 7 pitfall 2.

4.1 GitHub Actions (deployed environment)

name: mcptest-staging
on:
  workflow_dispatch:
  schedule:
    - cron: "0 6 * * *"

jobs:
  test:
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - name: Check out code
        uses: actions/checkout@v4

      - name: Cache mcptest install
        uses: actions/cache@v4
        with:
          path: ~/.local/bin/mcptest
          key: mcptest-${{ runner.os }}-1.0.0

      - name: Install mcptest
        run: |
          if [ ! -x "$HOME/.local/bin/mcptest" ]; then
            curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
          fi
          echo "$HOME/.local/bin" >> $GITHUB_PATH

      - name: Run mcptest against staging
        env:
          MCP_STAGING_URL: ${{ vars.MCP_STAGING_URL }}
          MCP_STAGING_TOKEN: ${{ secrets.MCP_STAGING_TOKEN }}
        run: |
          mcptest run tests/staging/ \
            --wait-for-ready \
            --reporter json --output target/mcptest-run.json \
            --verbose

      - name: Render reports
        if: always()
        run: |
          mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
          mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json

      - name: Upload reports
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: mcptest-staging-reports
          path: target/mcptest-*

Notes:

The environment: staging line binds the job to a GitHub Environment, so the secret picker reads from the staging-only scope. Production secrets do not leak into a staging job.
A nightly cron plus a manual workflow_dispatch trigger is the right default for deployed tests. Running on every PR puts load on staging and couples your PR signal to staging's health, which is rarely what you want.

4.2 GitLab CI (deployed environment)

default:
  image: alpine:3.20

stages:
  - test

variables:
  MCPTEST_VERSION: "1.0.0"

mcptest-staging:
  stage: test
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
    - if: $CI_PIPELINE_SOURCE == "web"
  environment:
    name: staging
    url: $MCP_STAGING_URL
  before_script:
    - apk add --no-cache curl bash
    - curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION="$MCPTEST_VERSION" sh
    - export PATH="$HOME/.local/bin:$PATH"
  script:
    - mcptest run tests/staging/
        --wait-for-ready
        --reporter json --output mcptest-run.json
        --verbose
    - mcptest report mcptest-run.json --format junit --output mcptest-junit.xml
    - mcptest report mcptest-run.json --format gitlab --output mcptest-codequality.json
  artifacts:
    when: always
    reports:
      junit: mcptest-junit.xml
      codequality: mcptest-codequality.json
    paths:
      - mcptest-junit.xml
      - mcptest-codequality.json
    expire_in: 1 week
  cache:
    key: "mcptest-${MCPTEST_VERSION}"
    paths:
      - $HOME/.local/bin/mcptest

Notes:

environment.name: staging binds the job to GitLab's Environments feature so deploy and test runs show up on the same dashboard. The URL link in the UI uses $MCP_STAGING_URL.
MCP_STAGING_TOKEN is configured in the GitLab project settings under CI/CD variables, scoped to the staging environment, masked, and protected. The runner injects it automatically.

4.3 CircleCI (deployed environment)

version: 2.1

parameters:
  staging-only:
    type: boolean
    default: false

jobs:
  mcptest-staging:
    docker:
      - image: cimg/base:stable
    resource_class: small
    steps:
      - checkout
      - restore_cache:
          keys:
            - v1-mcptest-1.0.0
      - run:
          name: Install mcptest
          command: |
            if [ ! -x "$HOME/.local/bin/mcptest" ]; then
              curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
            fi
            echo 'export PATH="$HOME/.local/bin:$PATH"' >> $BASH_ENV
      - save_cache:
          key: v1-mcptest-1.0.0
          paths:
            - ~/.local/bin/mcptest
      - run:
          name: Run mcptest against staging
          command: |
            mcptest run tests/staging/ \
              --wait-for-ready \
              --reporter json --output target/mcptest-run.json \
              --verbose
      - run:
          name: Render reports
          when: always
          command: |
            mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
            mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
      - store_test_results:
          path: target/mcptest-junit.xml
      - store_artifacts:
          path: target/mcptest-junit.xml
      - store_artifacts:
          path: target/mcptest-codequality.json

workflows:
  scheduled:
    when:
      and:
        - equal: [<< pipeline.schedule.name >>, "nightly"]
    jobs:
      - mcptest-staging:
          context: mcptest-staging

Notes:

The context: mcptest-staging line pulls MCP_STAGING_URL and MCP_STAGING_TOKEN from a CircleCI Context. Contexts are the right scope for environment-specific secrets, because a Context can be restricted to particular workflows and to particular OIDC subjects.
The parameters + when block keeps the job off the per-commit pipeline. Trigger it from a scheduled pipeline named nightly or with the Trigger Pipeline button in the CircleCI UI.

5. Combining patterns

A common shape is: fast stdio smoke on every push, plus a deployed-env integration run on pull requests or nightly. The smoke run gives a sub-minute red/green signal. The deployed run catches issues that only show up against a real network and real auth.

The example below uses GitHub Actions. The same shape works on the other two platforms with the obvious renames (workflow becomes pipeline, etc.).

name: mcptest

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: "0 6 * * *"

jobs:
  smoke-stdio:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - uses: actions/cache@v4
        with:
          path: |
            ~/.cargo/registry
            ~/.cargo/git
            target
          key: ${{ runner.os }}-cargo-${{ hashFiles('Cargo.lock') }}
      - run: cargo build --release --bin my-mcp-server
      - run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
      - run: |
          mcptest run tests/smoke/ \
            --wait-for-ready \
            --reporter json --output target/mcptest-smoke-run.json \
            --verbose
      - if: always()
        run: mcptest report target/mcptest-smoke-run.json --format junit --output target/mcptest-smoke-junit.xml
      - if: always()
        uses: actions/upload-artifact@v4
        with:
          name: smoke-junit
          path: target/mcptest-smoke-junit.xml

  integration-staging:
    if: github.event_name == 'pull_request' || github.event_name == 'schedule'
    needs: smoke-stdio
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
      - env:
          MCP_STAGING_URL: ${{ vars.MCP_STAGING_URL }}
          MCP_STAGING_TOKEN: ${{ secrets.MCP_STAGING_TOKEN }}
        run: |
          mcptest run tests/integration/ \
            --wait-for-ready \
            --reporter json --output target/mcptest-integration-run.json \
            --verbose
      - if: always()
        run: |
          mcptest report target/mcptest-integration-run.json --format junit --output target/mcptest-integration-junit.xml
          mcptest report target/mcptest-integration-run.json --format gitlab --output target/mcptest-codequality.json
      - if: always()
        uses: actions/upload-artifact@v4
        with:
          name: integration-reports
          path: target/mcptest-*

The split has three benefits:

The smoke job fails fast if the server cannot even start. You do not spend a slot on staging when the binary is broken.
The integration job is gated by needs: smoke-stdio, so staging only sees commits that already passed the local-process tests.
Staging-only flakiness no longer blocks every push, because the gating happens on PRs and on the nightly cron, not on every push to a feature branch.

The smoke and integration test sets should not overlap. Put readiness checks, schema-shape assertions, and tool surface coverage in tests/smoke/. Put authentication, network egress, real data, and slower flows in tests/integration/.

6. Caching strategy

Every platform has at least one cache layer. Picking the right one for each pattern is the difference between a 90-second CI run and a 6-minute one.

Pattern	Best cache	Key	Effect (expected, not yet measured)
Pattern 1 (stdio)	Cargo registry + `target/`	`Cargo.lock` hash	Skips dependency rebuild on every commit. Expected to save the bulk of CI time on a clean Rust project.
Pattern 2 (HTTP)	Docker image layer cache + mcptest binary	image tag + mcptest version	Skips image pull, skips re-downloading the mcptest release.
Pattern 3 (deployed)	mcptest binary only	mcptest version	Skips the install step. The job is otherwise network-bound.

The numbers above are deliberately labeled "expected." Measure on your own project before claiming a specific speedup in a release note.

What to put in the cache key

Cargo.lock hash for the Rust build cache. Restoring across versions produces inconsistent builds.
mcptest version for the binary cache. A binary pinned to 1.0.0 should cache under mcptest-${runner.os}-1.0.0, not under mcptest-${runner.os}-latest.
Image digest for service-container caches when the platform supports it. Tags can be moved; digests cannot.

What not to put in the cache key

Branch name. Causes one cache per branch, which defeats the purpose.
Timestamp. Same problem.
A wildcard over **/*.yaml. See pitfall 5 below for the recommended split.

Cross-job restore

GitHub Actions and GitLab CI both restore from any matching key (most recent wins). CircleCI restores in the order listed and stops at the first hit. Order your restore_cache keys from most specific to most general so a hit on the exact Cargo.lock hash wins over a hit on the prefix.

7. Common pitfalls

Six failures show up over and over. Each one has a one-line symptom, a one- line cause, and a fix.

7.1 Missing `--wait-for-ready` against an HTTP target

Symptom: the first tool call returns connection refused or 404 on the first run after the server image changed, then passes on a retry.

Cause: the service container is listed as a job-level service, but the healthcheck either is not configured or only checks the TCP port, not the MCP initialize handshake. The test runs before the server is fully ready.

Fix: pass --wait-for-ready to every mcptest run that targets an HTTP server. The flag polls the configured readiness probe and gates the first tool call. For platform-level healthchecks, also configure them on the service block so the runner does not even start the step until the container reports healthy.

7.2 Secrets in YAML instead of env vars

Symptom: a test file like

servers:
  staging:
    url: "https://staging.example.com/mcp"
    headers:
      Authorization: "Bearer sk_live_abcd1234..."

ends up committed to a public repo, GitHub flags a secret scan alert, and the on-call gets paged.

Cause: tokens were pasted into the YAML instead of interpolated from environment variables.

Fix: always use ${VAR} interpolation and store the token in the platform's secret store (GitHub Secrets, GitLab CI/CD variables, CircleCI Contexts). Rotate any token that has ever appeared in a tracked file. The schema accepts plain strings in the Authorization header for local convenience, but the linter prints a warning when it sees one that looks like a real token. Treat the warning as an error in CI.

7.3 Transport mismatch (cassette recorded against stdio, replayed against URL)

Symptom: cassette replay fails with no matching interaction or method mismatch for requests that look almost identical to the recorded ones.

Cause: a cassette captures the wire-level traffic including the transport. A cassette recorded against a stdio server contains JSON-RPC frames over the stdio framing convention. A cassette recorded against an HTTP server contains HTTP requests with headers. The two are not interchangeable.

Fix: record one cassette per transport, name them as such (fixtures/list_dir.stdio.cassette.json, fixtures/list_dir.http.cassette.json), and reference the matching one in each test file. If your CI runs both patterns, replay against the cassette that matches the transport of the run. The cassette format records the transport in its header so mcptest can refuse to replay across transports.

7.4 Exit-code interpretation

Symptom: a CI step shows green even though tests failed, or red even though all tests passed.

Cause: the shell wrapper around mcptest run swallowed the exit code. Common offenders are bash -c "mcptest run ... | tee log.txt" (uses the exit code of tee, not mcptest) and set +e left over from debugging.

Fix: invoke mcptest directly as the last command of the step, with no pipe. If you must tee, use set -o pipefail first, or use the platform's log capture (every example above pipes nothing). The exit codes mcptest returns are documented in the troubleshooting guide. Treat anything non- zero as a failure unless you have a specific reason not to.

7.5 Cache key not invalidating when tests change

Symptom: a test file change does not change the result, because CI is running an older test set out of the cache.

Cause: the cache key covers source dependencies (e.g., Cargo.lock) but not the test directory. The binary is rebuilt, but the test fixtures are restored from cache and overwrite the new ones.

Fix: do not put test fixtures inside the cached path. Cache target/, the Cargo registry, and the mcptest binary, but not tests/ or examples/. If you must cache derived test artifacts (snapshots, golden files), key the cache on the hash of the source that produced them, for example hashFiles('tests/**').

7.6 Different runner OS surfacing different test results

Symptom: the test suite is green on Ubuntu, red on macOS. The failure is a path comparison or a line-ending mismatch.

Cause: Ubuntu's tmpfs is case-sensitive, macOS's HFS+ default is case- insensitive. Ubuntu line-ends with \n, Windows line-ends with \r\n. A matcher that asserts exact equality on a path or on a stdout buffer will disagree across runners.

Fix: avoid asserting exact equality on values that differ by platform. For a path or a multi-line string, use regex (anchor only the parts you care about, and write \r?\n where a line ending appears) or contains instead of exact. For environment-dependent values (temp dir, hostname), interpolate the actual environment with ${VAR} from the test runtime rather than hard-coding a literal.

8. Debugging failing CI runs

The first step is always the same: read the JUnit output. Every snippet above writes it to a known path and uploads it as an artifact. Open the file locally, find the failing case, copy the symptom, and search the troubleshooting guide.

If that does not resolve it, the steps below escalate in order.

8.1 Re-run with `--verbose`

Every snippet above already passes --verbose. Pull the job log and search for level=DEBUG. The verbose output includes:

the resolved server command line or URL,
every request and response header,
the readiness wait duration,
the matcher decision tree for every failed assertion.

If --verbose is not enough, add --debug to a one-off CI run. --debug prints raw wire bytes (with secrets redacted) and is far too noisy for default CI but is the right setting for a forensic run.

8.2 Pull the report artifacts

Every snippet above writes the reports it asks for to target/:

target/mcptest-run.json          # --reporter json --output ...
target/mcptest-junit.xml         # mcptest report ... --format junit
target/mcptest-codequality.json  # mcptest report ... --format gitlab

The JSON run file is the source of truth: it carries the full run envelope, so you can re-render any reporter format from it after the fact without re-running the suite. Every snippet uploads the target/mcptest-* glob, so all three are one click away.

For a forensic run, add --debug to the failing job. --debug prints the resolved config and raw wire bytes (with secrets redacted) to the job log. Capture the JSON run file from a passing run and from the failing run, then diff them. The first divergence usually points at the bug.

8.3 Reproduce locally with the same env vars

The reason CI fails and your laptop passes is almost always the environment. Reproduce by copying the env block out of the workflow:

export MCP_STAGING_URL="https://staging.example.com/mcp"
export MCP_STAGING_TOKEN="$(pass show mcptest/staging)"
export CI=true
export RUST_LOG=mcptest=debug
mcptest run tests/staging/ --wait-for-ready --verbose

Three rules for this loop:

Match the runner OS. If CI runs on Ubuntu and you run on macOS, use a container: docker run --rm -it -v "$PWD:/app" -w /app rust:1.81 bash.
Match the mcptest version. If CI is pinned to 1.0.0, install 1.0.0 locally with curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0, sh not whatever Homebrew has.
Match the test file. Do not run tests/; run the exact subdirectory the failing job runs.

If the local run passes with the same versions, the same OS, and the same env, the next suspect is the network. Run with --debug (or RUST_LOG=mcptest=trace) to log every connection attempt and the raw wire bytes with secrets redacted.

8.4 When to file a bug

Open an issue against the mcptest project when:

the failure repeats on three consecutive runs and you can attach the JSON run file plus a --debug log,
the JUnit output and the verbose log disagree on which assertion failed,
mcptest itself crashes (non-zero exit code with no report output at all).

For everything else, the troubleshooting guide entry plus the verbose log is usually enough.

Appendix: snippet index

If you got here from another page, this is the shortest path to each worked example.

Pattern 1 (stdio)
- GitHub Actions: section 2.1
- GitLab CI: section 2.2
- CircleCI: section 2.3
Pattern 2 (HTTP service container)
- GitHub Actions: section 3.1
- GitLab CI: section 3.2
- CircleCI: section 3.3
Pattern 3 (deployed environment)
- GitHub Actions: section 4.1
- GitLab CI: section 4.2
- CircleCI: section 4.3
Combined smoke + integration: section 5
Caching strategy: section 6
Common pitfalls: section 7
Debugging: section 8

Open follow-up items, as of this writing:

A soapbucket/mcptest-action GitHub Action is staged as an example under examples/mcptest-action/ but is not yet published, so do not reference it in a real workflow. The curl ... install.sh snippets in this guide do not depend on it and work today.
--wait-for-ready is accepted on every subcommand and is referenced by all HTTP and deployed-environment snippets. The flag parses and validates its budget today; the full readiness-polling behavior is still being wired, so until it ships the platform healthcheck is what gates the service.

9. Jenkins

Jenkins is the most common platform in enterprise shops, where a Jenkinsfile already lives next to the repo and the build server is on-prem. The patterns mirror the platforms above: stdio subprocess, HTTP service container, deployed environment. Each pattern fits into both the declarative and the scripted pipeline syntax.

The Jenkinsfile snippets assume:

A Docker-capable agent (the snippet uses agent { docker { image '...' } } in declarative form). Air-gapped shops pull from an internal registry; see section 12.
The JUnit publisher plugin (junit step) for the test results UI.
The Warnings Next Generation plugin for SARIF surfacing, where relevant. mcptest renders SARIF with mcptest report --format sarif.

9.1 Declarative Jenkinsfile (stdio)

pipeline {
  agent {
    docker {
      image 'rust:1.81'
      args '-v $HOME/.cargo:/root/.cargo'
    }
  }

  environment {
    MCPTEST_VERSION = '1.0.0'
    PATH = "$HOME/.local/bin:$PATH"
  }

  stages {
    stage('Build server') {
      steps {
        sh 'cargo build --release --bin my-mcp-server'
      }
    }

    stage('Install mcptest') {
      steps {
        sh 'curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$MCPTEST_VERSION' sh
      }
    }

    stage('Run mcptest') {
      steps {
        sh '''
          mcptest run tests/ \
            --wait-for-ready \
            --reporter json --output target/mcptest-run.json \
            --verbose
          mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
          mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
        '''
      }
    }
  }

  post {
    always {
      junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
      archiveArtifacts artifacts: 'target/mcptest-*', allowEmptyArchive: true
    }
  }
}

Notes:

The junit post step publishes results to the Jenkins Tests tab and feeds the build-trends graph. allowEmptyResults: false fails the build if no results were produced (catches the "mcptest crashed before writing the report" case).
archiveArtifacts keeps the JSON run file, the JUnit XML, and the GitLab Code Quality JSON for download. Downstream review bots consume the JSON.

9.2 Declarative Jenkinsfile (HTTP localhost)

pipeline {
  agent {
    docker {
      image 'docker:24'
      args '--privileged -v /var/run/docker.sock:/var/run/docker.sock'
    }
  }

  environment {
    MCPTEST_VERSION = '1.0.0'
    MCP_SERVER_URL = 'http://mcp-server:8080/mcp'
  }

  stages {
    stage('Boot server') {
      steps {
        sh '''
          docker network create mcptest-net || true
          docker run -d --rm \
            --network mcptest-net \
            --name mcp-server \
            -p 8080:8080 \
            ghcr.io/example/my-mcp-server:0.7.3
        '''
      }
    }

    stage('Run mcptest') {
      steps {
        sh '''
          docker run --rm \
            --network mcptest-net \
            -e MCP_SERVER_URL=$MCP_SERVER_URL \
            -v $WORKSPACE:/workspace \
            -w /workspace \
            --entrypoint sh \
            soapbucket/mcptest:$MCPTEST_VERSION -c '
              mcptest run tests/http/ \
                --wait-for-ready \
                --reporter json --output target/mcptest-run.json \
                --verbose
              mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
              mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
            '
        '''
      }
    }
  }

  post {
    always {
      sh 'docker stop mcp-server || true'
      sh 'docker network rm mcptest-net || true'
      junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
      archiveArtifacts artifacts: 'target/mcptest-*', allowEmptyArchive: true
    }
  }
}

Notes:

The shared Docker network lets the mcptest container reach the server by the alias mcp-server. The hostname matches what the test YAML expects.
The post { always } cleanup runs whether the test passes or fails so a flaky job does not leak containers between runs.

9.3 Declarative Jenkinsfile (deployed URL)

pipeline {
  agent any

  environment {
    MCPTEST_VERSION = '1.0.0'
    MCP_STAGING_URL = credentials('mcp-staging-url')
    MCP_STAGING_TOKEN = credentials('mcp-staging-token')
  }

  triggers {
    cron('H 6 * * *')
  }

  stages {
    stage('Install mcptest') {
      steps {
        sh 'curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$MCPTEST_VERSION' sh
      }
    }

    stage('Run mcptest against staging') {
      steps {
        sh '''
          mcptest run tests/staging/ \
            --wait-for-ready \
            --reporter json --output target/mcptest-run.json \
            --verbose
          mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
          mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
        '''
      }
    }
  }

  post {
    always {
      junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
      archiveArtifacts artifacts: 'target/mcptest-*', allowEmptyArchive: true
    }
  }
}

Notes:

credentials('mcp-staging-token') pulls from the Jenkins Credentials store. The credential ID matches the name; in Jenkins, secrets are bound to the job via the Credentials Binding plugin.
The cron('H 6 * * *') trigger uses Jenkins's hash spreading so multiple jobs scheduled at "6 AM" do not all fire at the same minute.
Per pitfall 7.2, the token never appears in YAML. The credentials() call masks the value in the job log.

9.4 Scripted Jenkinsfile

For legacy Jenkins installations that still use scripted pipelines, the same stdio pattern looks like this:

node('docker') {
  def mcptestVersion = '1.0.0'

  docker.image('rust:1.81').inside {
    stage('Checkout') {
      checkout scm
    }

    stage('Build server') {
      sh 'cargo build --release --bin my-mcp-server'
    }

    stage('Install mcptest') {
      sh "curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=${mcptestVersion}" sh
    }

    stage('Run mcptest') {
      try {
        sh """
          mcptest run tests/ \\
            --wait-for-ready \\
            --reporter json --output target/mcptest-run.json \\
            --verbose
          mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
          mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
        """
      } finally {
        junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
        archiveArtifacts artifacts: 'target/mcptest-*', allowEmptyArchive: true
      }
    }
  }
}

The try { ... } finally { ... } block is the scripted equivalent of post { always }. It guarantees the JUnit publisher runs even when the test step fails.

9.5 SARIF via Warnings Next Generation

Surface findings in Jenkins through the Warnings Next Generation plugin. Render SARIF from the JSON run file, then add a post step:

stage('Render SARIF') {
  steps {
    sh 'mcptest report target/mcptest-run.json --format sarif --output target/mcptest.sarif'
  }
}

// ... in the post block:
post {
  always {
    junit testResults: 'target/mcptest-junit.xml', allowEmptyResults: false
    recordIssues(
      enabledForFailure: true,
      tools: [
        sarif(pattern: 'target/mcptest.sarif')
      ]
    )
  }
}

The plugin's UI groups findings by rule ID and surfaces them on the build page. Quality-gate rules (fail the build if more than N high-severity findings appear) live in the plugin's configuration, not in the Jenkinsfile.

9.6 Shared library: `mcptestStage()`

Larger Jenkins shops with many repos converge on a shared library that exposes reusable steps. Once examples/ci-templates/ exists, we will ship a vars/mcptestStage.groovy there. The intended call site:

@Library('soapbucket-shared') _

pipeline {
  agent any
  stages {
    stage('Build') { steps { sh 'cargo build --release --bin my-mcp-server' } }
    stage('mcptest') {
      steps {
        mcptestStage(
          version: '1.0.0',
          testDir: 'tests/',
          formats: ['junit', 'gitlab']
        )
      }
    }
  }
}

The helper resolves to the three stages in section 9.1 with the inputs parameterized. It hides the curl install and the post-step plumbing so each consuming pipeline reads as one line.

The shared library source is not yet published; the snippet above is the intended call site for documentation purposes.

10. Buildkite

Buildkite pipelines are YAML files at .buildkite/pipeline.yml. The agent queue routes each step to a matching agent pool, so the same pipeline can run a Rust build on a build agent and a deployed-environment test on a network-egress agent.

10.1 Buildkite (stdio)

steps:
  - label: ":rust: Build server"
    key: build
    agents:
      queue: builders
    plugins:
      - docker#v5.10.0:
          image: rust:1.81
          mount-checkout: true
          environment:
            - CARGO_HOME=/workdir/.cargo
    commands:
      - cargo build --release --bin my-mcp-server
    artifact_paths:
      - "target/release/my-mcp-server"

  - label: ":test_tube: mcptest stdio"
    key: mcptest
    depends_on: build
    agents:
      queue: builders
    plugins:
      - artifacts#v1.9.4:
          download: "target/release/my-mcp-server"
      - docker#v5.10.0:
          image: rust:1.81
          mount-checkout: true
    commands:
      - chmod +x target/release/my-mcp-server
      - curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=1.0.0 sh
      - export PATH="$HOME/.local/bin:$PATH"
      - |
        mcptest run tests/ \
          --wait-for-ready \
          --reporter json --output target/mcptest-run.json \
          --verbose
      - mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
      - mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
    artifact_paths:
      - "target/mcptest-*"

Notes:

agents.queue: builders routes the step to the builder pool. Use a different queue (agents.queue: egress) for steps that need outbound network access to a staging environment.
The docker plugin runs the step inside the named image. mount-checkout: true mounts the working directory, so artifacts written under target/ end up on the host.
The Buildkite Test Analytics product consumes the JUnit XML when the Test Engine plugin is configured; see section 10.4 for the annotation path that works without the paid product.

10.2 Buildkite (HTTP localhost via docker-compose)

steps:
  - label: ":test_tube: mcptest http"
    agents:
      queue: builders
    plugins:
      - docker-compose#v5.10.0:
          run: mcptest
          config: .buildkite/docker-compose.yml
    artifact_paths:
      - "target/mcptest-*"

With .buildkite/docker-compose.yml:

services:
  mcp-server:
    image: ghcr.io/example/my-mcp-server:0.7.3
    ports:
      - "8080:8080"
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost:8080/health"]
      interval: 5s
      timeout: 2s
      retries: 10

  mcptest:
    image: soapbucket/mcptest:1.0.0
    depends_on:
      mcp-server:
        condition: service_healthy
    environment:
      MCP_SERVER_URL: http://mcp-server:8080/mcp
    volumes:
      - .:/workspace
    working_dir: /workspace
    entrypoint: ["sh", "-c"]
    command:
      - |
        mcptest run tests/http/ \
          --wait-for-ready \
          --reporter json --output target/mcptest-run.json \
          --verbose
        mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
        mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json

Notes:

depends_on.mcp-server.condition: service_healthy waits for the healthcheck to pass before mcptest starts. Combined with --wait-for-ready, this covers both TCP-level and MCP-level readiness.
The docker-compose plugin is the right shape when more than one sidecar is involved (database + server + mcptest, for instance). For a single sidecar, the bare docker plugin with an inline network is lighter.

10.3 Buildkite (deployed URL)

steps:
  - label: ":test_tube: mcptest staging"
    if: build.source == "schedule" || build.message =~ /\[staging\]/
    agents:
      queue: egress
    plugins:
      - docker#v5.10.0:
          image: soapbucket/mcptest:1.0.0
          entrypoint: sh
          environment:
            - MCP_STAGING_URL
            - MCP_STAGING_TOKEN
          mount-checkout: true
    commands:
      - |
        mcptest run tests/staging/ \
          --wait-for-ready \
          --reporter json --output target/mcptest-run.json \
          --verbose
      - mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
      - mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
    retry:
      automatic:
        - exit_status: -1
          limit: 2
    artifact_paths:
      - "target/mcptest-*"

Notes:

agents.queue: egress routes to an agent pool with outbound network access. The builder pool typically blocks egress for supply-chain reasons.
If a worker can legitimately draw an empty test selection (a sharded matrix, for instance), pass --pass-with-no-tests so an empty run exits 0 instead of 7. mcptest does not emit a separate "environment unavailable" exit code; the stable set is 0, 1, 2, 5, 6, 7.
retry.automatic.exit_status: -1 retries on infrastructure failures (network drop, agent loss) up to twice. It does not retry on test failures, only on operational ones.
MCP_STAGING_URL and MCP_STAGING_TOKEN come from Buildkite's secret store, attached to the pipeline as environment variables. The Docker plugin's environment list passes them through without ever exposing them in the YAML.

10.4 Annotate the build with JUnit summary

Buildkite's annotation API attaches Markdown to the build page. To surface mcptest results inline (without paying for Test Analytics):

  - label: ":memo: Annotate mcptest results"
    depends_on: mcptest
    allow_dependency_failure: true
    agents:
      queue: builders
    commands:
      - buildkite-agent artifact download "target/mcptest-junit.xml" .
      - |
        if grep -q 'failures="0"' target/mcptest-junit.xml; then
          buildkite-agent annotate --style success "mcptest passed."
        else
          FAIL_COUNT=$(grep -oE 'failures="[0-9]+"' target/mcptest-junit.xml | head -n1 | grep -oE '[0-9]+')
          buildkite-agent annotate --style error "mcptest failed ($FAIL_COUNT test(s)). See artifacts."
        fi

allow_dependency_failure: true runs the annotation step even when mcptest exited non-zero, so failed builds still get the inline summary.

10.5 Agent queue routing

Three queues are typically enough:

Queue	Used for
`builders`	Compile-heavy work (cargo build, npm install).
`egress`	Steps that need outbound network to staging or prod URLs.
`mcptest`	Steps that need the mcptest binary preinstalled.

The mcptest queue is optional; the snippets above install mcptest into the step on demand. A dedicated queue saves the install step on every run at the cost of an extra agent pool to maintain. For low-volume pipelines the install-on-demand pattern is simpler.

11. Azure DevOps

Azure DevOps pipelines live at azure-pipelines.yml. The platform's test-results UI consumes JUnit through PublishTestResults@2 and the SARIF surface through PublishCodeAnalysisResults@1.

11.1 Azure DevOps (stdio)

trigger:
  branches:
    include: [main]

pool:
  vmImage: ubuntu-latest

variables:
  MCPTEST_VERSION: 1.0.0
  RUST_VERSION: 1.81

steps:
  - checkout: self

  - task: Cache@2
    inputs:
      key: 'cargo | "$(Agent.OS)" | Cargo.lock'
      path: |
        $(HOME)/.cargo
        target
      restoreKeys: |
        cargo | "$(Agent.OS)"

  - script: |
      curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain $(RUST_VERSION)
      echo "##vso[task.prependpath]$HOME/.cargo/bin"
    displayName: Install Rust toolchain

  - script: cargo build --release --bin my-mcp-server
    displayName: Build server

  - script: |
      curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$(MCPTEST_VERSION) sh
      echo "##vso[task.prependpath]$HOME/.local/bin"
    displayName: Install mcptest

  - script: |
      mcptest run tests/ \
        --wait-for-ready \
        --reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
        --verbose
      mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format junit --output $(Build.ArtifactStagingDirectory)/mcptest-junit.xml
      mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format gitlab --output $(Build.ArtifactStagingDirectory)/mcptest-codequality.json
    displayName: Run mcptest

  - task: PublishTestResults@2
    condition: succeededOrFailed()
    inputs:
      testRunner: JUnit
      testResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest-junit.xml"
      testRunTitle: mcptest stdio
      failTaskOnFailedTests: true

  - task: PublishBuildArtifacts@1
    condition: succeededOrFailed()
    inputs:
      pathToPublish: $(Build.ArtifactStagingDirectory)
      artifactName: mcptest-reports

Notes:

condition: succeededOrFailed() is the Azure equivalent of if: always(). Without it, the publish step is skipped on test failure and the UI loses the report.
failTaskOnFailedTests: true makes the PublishTestResults task fail the pipeline when JUnit reports failures. Without this flag the test results show up in the UI but the pipeline reports success.
##vso[task.prependpath]... is Azure's logging-command form for modifying PATH for subsequent steps.

11.2 Azure DevOps (HTTP localhost via container resource)

resources:
  containers:
    - container: mcp-server
      image: ghcr.io/example/my-mcp-server:0.7.3
      ports:
        - 8080:8080

services:
  mcp-server: mcp-server

variables:
  MCPTEST_VERSION: 1.0.0
  MCP_SERVER_URL: http://mcp-server:8080/mcp

pool:
  vmImage: ubuntu-latest

steps:
  - checkout: self

  - script: |
      curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$(MCPTEST_VERSION) sh
      echo "##vso[task.prependpath]$HOME/.local/bin"
    displayName: Install mcptest

  - script: |
      mcptest run tests/http/ \
        --wait-for-ready \
        --reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
        --verbose
      mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format junit --output $(Build.ArtifactStagingDirectory)/mcptest-junit.xml
      mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format gitlab --output $(Build.ArtifactStagingDirectory)/mcptest-codequality.json
    displayName: Run mcptest

  - task: PublishTestResults@2
    condition: succeededOrFailed()
    inputs:
      testRunner: JUnit
      testResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest-junit.xml"
      testRunTitle: mcptest http
      failTaskOnFailedTests: true

Notes:

The resources.containers block declares the image; the services block attaches it to the job network. The container is reachable at mcp-server:8080 from the job steps.
For images in a private registry, add a service connection (Pipelines

Service connections > Docker Registry) and reference it in the

container resource via endpoint:.

11.3 Azure DevOps (deployed URL with service connection)

schedules:
  - cron: "0 6 * * *"
    displayName: Nightly staging tests
    branches:
      include: [main]
    always: true

pool:
  vmImage: ubuntu-latest

variables:
  MCPTEST_VERSION: 1.0.0
  - group: mcptest-staging   # Variable Group with MCP_STAGING_URL, MCP_STAGING_TOKEN

steps:
  - checkout: self

  - script: |
      curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=$(MCPTEST_VERSION) sh
      echo "##vso[task.prependpath]$HOME/.local/bin"
    displayName: Install mcptest

  - script: |
      mcptest run tests/staging/ \
        --wait-for-ready \
        --reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
        --verbose
      mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format junit --output $(Build.ArtifactStagingDirectory)/mcptest-junit.xml
      mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format gitlab --output $(Build.ArtifactStagingDirectory)/mcptest-codequality.json
    displayName: Run mcptest against staging
    env:
      MCP_STAGING_URL: $(MCP_STAGING_URL)
      MCP_STAGING_TOKEN: $(MCP_STAGING_TOKEN)

  - task: PublishTestResults@2
    condition: succeededOrFailed()
    inputs:
      testRunner: JUnit
      testResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest-junit.xml"
      testRunTitle: mcptest staging
      failTaskOnFailedTests: true

Notes:

The Variable Group mcptest-staging is defined in Library and bound to the pipeline. Secrets in a Variable Group are encrypted at rest and injected into the job environment.
For OAuth-authenticated MCP servers, add a Generic Service Connection (Service Connections > New > Generic), reference it with serviceConnection: 'mcp-staging-oauth', and pull tokens with the OAuth1 or Bearer Token connection types.

11.4 SARIF via PublishCodeAnalysisResults

Render SARIF from the JSON run file, then publish it:

  - script: |
      mcptest run tests/ \
        --reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
        --verbose
      mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format sarif --output $(Build.ArtifactStagingDirectory)/mcptest.sarif
      mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format junit --output $(Build.ArtifactStagingDirectory)/mcptest-junit.xml
    displayName: Run mcptest

  - task: PublishCodeAnalysisResults@1
    condition: succeededOrFailed()
    inputs:
      codeAnalysisResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest.sarif"
      codeAnalysisResultsType: SARIF

11.5 YAML templates for reuse

For orgs with many repositories, factor the mcptest steps into a YAML template. Add a mcptest.yml template at the org's shared-templates repo:

# templates/mcptest.yml
parameters:
  - name: testDir
    type: string
    default: tests/
  - name: mcptestVersion
    type: string
    default: 1.0.0
  - name: formats
    type: object
    default:
      - junit
      - gitlab

steps:
  - script: |
      curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=${{ sh parameters.mcptestVersion }}
      echo "##vso[task.prependpath]$HOME/.local/bin"
    displayName: Install mcptest

  - script: |
      mcptest run ${{ parameters.testDir }} \
        --wait-for-ready \
        --reporter json --output $(Build.ArtifactStagingDirectory)/mcptest-run.json \
        --verbose
      ${{ each f in parameters.formats }}:
        mcptest report $(Build.ArtifactStagingDirectory)/mcptest-run.json --format ${{ f }} --output $(Build.ArtifactStagingDirectory)/mcptest-${{ f }}-report
    displayName: Run mcptest

  - task: PublishTestResults@2
    condition: succeededOrFailed()
    inputs:
      testRunner: JUnit
      testResultsFiles: "$(Build.ArtifactStagingDirectory)/mcptest-junit-report"
      failTaskOnFailedTests: true

Consumed from a downstream pipeline:

resources:
  repositories:
    - repository: templates
      type: git
      name: shared/templates
      ref: refs/tags/v1.0.0

steps:
  - template: mcptest.yml@templates
    parameters:
      testDir: tests/integration/
      mcptestVersion: 1.0.0

Pin the template repo to a tag, not to main. A template change silently rolls out to every consuming pipeline if the consumer points at a branch.

12. Self-hosted and air-gapped environments

Enterprise installations often run CI on isolated networks with no outbound HTTP. Every artifact (Docker image, mcptest binary, SARIF schema) has to live inside the perimeter. The snippets below adapt the patterns above for that environment.

The shape is the same on every CI platform; the difference is sourcing.

12.1 Offline install: docker save and tarball

For mcptest itself, save the Docker image on an internet-connected host and ship the tarball through the same channel you use for other controlled artifacts (S3 with bucket policies, an internal artifact store, sneakernet via removable media for the strictest shops):

# On an internet-connected host
docker pull soapbucket/mcptest:1.0.0
docker save soapbucket/mcptest:1.0.0 -o mcptest-1.0.0.tar

# Compute a checksum the receiving side can verify
sha256sum mcptest-1.0.0.tar > mcptest-1.0.0.tar.sha256

# On the air-gapped CI agent
sha256sum -c mcptest-1.0.0.tar.sha256
docker load -i mcptest-1.0.0.tar
docker tag soapbucket/mcptest:1.0.0 internal-registry.example.com/mcptest:1.0.0
docker push internal-registry.example.com/mcptest:1.0.0

For the standalone binary, mirror the GitHub release artifact to an internal artifact store and adjust the install command:

# Replaces the curl https://download.mcptest.sh/install.sh path
INTERNAL_BASE="https://artifacts.example.com/mcptest/1.0.0"
curl -fsSL "$INTERNAL_BASE/mcptest-linux-x86_64.tar.gz" -o mcptest.tar.gz
sha256sum -c <(echo "$(curl -fsSL "$INTERNAL_BASE/SHA256SUMS")")
tar -xzf mcptest.tar.gz
sudo install mcptest /usr/local/bin/

The official install script (install.sh) accepts an MCPTEST_DOWNLOAD_BASE environment variable that points at the internal mirror. The install flow is otherwise identical.

12.2 Internal registry mirroring

For server images, the same docker save / docker load pattern applies. Most enterprise registries (Harbor, Artifactory, Nexus, ECR behind a private endpoint) accept the saved tarball directly:

docker pull ghcr.io/example/my-mcp-server:0.7.3
docker save ghcr.io/example/my-mcp-server:0.7.3 -o my-mcp-server-0.7.3.tar

# Transfer through the controlled channel, then on the air-gapped side:
docker load -i my-mcp-server-0.7.3.tar
docker tag ghcr.io/example/my-mcp-server:0.7.3 \
  internal-registry.example.com/mcptest/my-mcp-server:0.7.3
docker push internal-registry.example.com/mcptest/my-mcp-server:0.7.3

Update the CI snippets to reference the internal registry hostname:

# GitHub Actions / Jenkins / Buildkite / Azure pattern
services:
  mcp-server:
    image: internal-registry.example.com/mcptest/my-mcp-server:0.7.3

The image digest is more robust than the tag, because tags are mutable and registries do not always enforce immutability:

services:
  mcp-server:
    image: internal-registry.example.com/mcptest/my-mcp-server@sha256:abcdef...

12.3 No outbound HTTP

mcptest's default behavior already aligns with air-gapped environments:

No update check. mcptest never calls home to check for a newer version. There is no update-check flag to set; the absence of an update check is the contract.
No telemetry. The OSS build emits no telemetry. There is no --no-telemetry flag to set; the absence of telemetry is the contract.
No live cassette pulls. Cassettes live next to the test files in the repo. mcptest never fetches a cassette from a remote URL at test time.
Schema URLs are advisory. The JSON Schema URL (https://mcptest.sh/schema/v1.json) appears in YAML files as a yaml-language-server: hint for editor tooling. The runner does not fetch the schema at runtime; it ships embedded in the binary.

If your CI agent enforces a strict deny-by-default egress policy, the only outbound calls a normal mcptest run makes are to the deployed MCP server URL (in Pattern 3) or to nothing at all (in Patterns 1 and 2, which run against local processes or sidecars).

12.4 HTTPS_PROXY and HTTP_PROXY support

For environments where outbound HTTP is allowed only through a corporate proxy, mcptest's HTTP transport honors the standard environment variables:

Variable	Effect
`HTTPS_PROXY`	Routes HTTPS requests through the named proxy.
`HTTP_PROXY`	Routes HTTP requests through the named proxy.
`NO_PROXY`	Comma-separated list of hostnames to bypass.

Example for a deployed-URL pattern behind a corporate proxy:

# GitHub Actions / Jenkins / Buildkite / Azure pattern
env:
  HTTPS_PROXY: http://proxy.example.com:3128
  NO_PROXY: localhost,127.0.0.1,internal-registry.example.com
  MCP_STAGING_URL: https://staging.example.com/mcp
  MCP_STAGING_TOKEN: ${{ secrets.MCP_STAGING_TOKEN }}

The reqwest-based HTTP transport reads these variables at startup. mcptest also exposes explicit proxy flags (--proxy, --http-proxy, --https-proxy, --no-proxy, and --noproxy HOSTLIST) that override the environment when a single run needs different routing.

12.5 Internal certificate authorities

When the deployed environment uses a private certificate authority, mount the CA bundle into the agent's trust store. The standard Linux path is /etc/ssl/certs/ca-certificates.crt. mcptest's HTTP client respects SSL_CERT_FILE and SSL_CERT_DIR, so the simplest path is:

env:
  SSL_CERT_FILE: /etc/internal-ca/bundle.pem

For the Docker image, bake the CA bundle into the base image:

FROM soapbucket/mcptest:1.0.0
COPY internal-ca.pem /usr/local/share/ca-certificates/internal-ca.crt
RUN update-ca-certificates

Republish the resulting image to the internal registry and use it in place of soapbucket/mcptest:1.0.0.

13. TeamCity (stub)

Full TeamCity integration is not yet documented. For now, TeamCity is supported via the generic Docker image and the standard JUnit publisher.

A minimal TeamCity build step (Command Line runner):

docker run --rm \
  -v %teamcity.build.checkoutDir%:/workspace \
  -w /workspace \
  --entrypoint sh \
  soapbucket/mcptest:1.0.0 -c '
    mcptest run tests/ \
      --wait-for-ready \
      --reporter json --output target/mcptest-run.json \
      --verbose
    mcptest report target/mcptest-run.json --format junit --output target/mcptest-junit.xml
    mcptest report target/mcptest-run.json --format gitlab --output target/mcptest-codequality.json
  '

Then add an XML Report Processing build feature with:

Report type: Ant JUnit
Monitoring rules: target/mcptest-junit.xml

The TeamCity tests tab consumes the JUnit report and surfaces failures on the build page. The GitLab Code Quality JSON is archived as an artifact through the Artifact Paths setting on the build configuration.

What is missing from this stub:

A dedicated TeamCity meta-runner (a reusable build step type with parameter prompts) for mcptest.
First-class support for the SARIF surface (TeamCity has its own inspection model that does not map cleanly to SARIF).
A Kotlin DSL representation in .teamcity/settings.kts for shops that version-control their build configurations.

These items land in v1.2 if there is demand. The current Docker + JUnit path is enough to run mcptest in a TeamCity pipeline today.

14. Reusable templates

Reusable templates under examples/ci-templates/ are planned but not yet published. Once that directory exists, it will ship:

examples/ci-templates/jenkins/Jenkinsfile.{stdio,http,deployed}
examples/ci-templates/jenkins/vars/mcptestStage.groovy
examples/ci-templates/buildkite/pipeline.{stdio,http,deployed}.yml
examples/ci-templates/buildkite/docker-compose.yml
examples/ci-templates/azure-devops/azure-pipelines.{stdio,http,deployed}.yml
examples/ci-templates/azure-devops/templates/mcptest.yml
examples/ci-templates/teamcity/build.cmd (the stub above)

Until the directory exists, treat the snippets in this guide as the canonical source.

CI integration patterns

How to read this guide (decision tree)

2. Pattern 1: stdio servers

2.1 GitHub Actions (stdio)

2.2 GitLab CI (stdio)

2.3 CircleCI (stdio)

3. Pattern 2: HTTP service container

3.1 GitHub Actions (HTTP service container)

3.2 GitLab CI (HTTP service container)

3.3 CircleCI (HTTP service container)

4. Pattern 3: deployed environment

4.1 GitHub Actions (deployed environment)

4.2 GitLab CI (deployed environment)

4.3 CircleCI (deployed environment)

5. Combining patterns

6. Caching strategy

What to put in the cache key

What not to put in the cache key

Cross-job restore

7. Common pitfalls

7.1 Missing --wait-for-ready against an HTTP target

7.2 Secrets in YAML instead of env vars

7.3 Transport mismatch (cassette recorded against stdio, replayed against URL)

7.4 Exit-code interpretation

7.5 Cache key not invalidating when tests change

7.6 Different runner OS surfacing different test results

8. Debugging failing CI runs

8.1 Re-run with --verbose

8.2 Pull the report artifacts

8.3 Reproduce locally with the same env vars

8.4 When to file a bug

Appendix: snippet index

9. Jenkins

9.1 Declarative Jenkinsfile (stdio)

9.2 Declarative Jenkinsfile (HTTP localhost)

9.3 Declarative Jenkinsfile (deployed URL)

9.4 Scripted Jenkinsfile

9.5 SARIF via Warnings Next Generation

9.6 Shared library: mcptestStage()

10. Buildkite

10.1 Buildkite (stdio)

10.2 Buildkite (HTTP localhost via docker-compose)

10.3 Buildkite (deployed URL)

10.4 Annotate the build with JUnit summary

10.5 Agent queue routing

11. Azure DevOps

11.1 Azure DevOps (stdio)

11.2 Azure DevOps (HTTP localhost via container resource)

11.3 Azure DevOps (deployed URL with service connection)

11.4 SARIF via PublishCodeAnalysisResults

11.5 YAML templates for reuse

12. Self-hosted and air-gapped environments

12.1 Offline install: docker save and tarball

12.2 Internal registry mirroring

12.3 No outbound HTTP

12.4 HTTPS_PROXY and HTTP_PROXY support

12.5 Internal certificate authorities

13. TeamCity (stub)

14. Reusable templates

7.1 Missing `--wait-for-ready` against an HTTP target

8.1 Re-run with `--verbose`

9.6 Shared library: `mcptestStage()`