mcptest docs GitHub

Docker, docker-compose, and package-runner patterns

How to drive mcptest against MCP servers that live behind package managers (uvx, npx, pipx, pip, cargo) or Docker images. Five patterns are covered, each with a runnable YAML snippet and notes on the rough edges. A short Kubernetes section at the end marks that surface as deferred.

If you are looking for the broader CI story (matrix builds, secret handling, artifact upload, the mcptest-results action) the canonical page will be docs/guides/ci-integration.md (the CI integration patterns guide, in flight). This document focuses on the mechanics of getting the server under test up and reachable. The CI guide will reference back to this page once it lands.

Pattern at a glance

#Server lives inmcptest reaches it viaUse when
1A registry: PyPI, npm, crates.iostdio, spawned with uvx/npx/pipx/pip/cargoThe server is published as a package.
2A pre-built Docker imagestdio, spawned with docker run -iThe vendor ships a container, not a binary.
3A docker-compose serviceHTTP/SSE URLThe server has runtime dependencies (Postgres, Redis, a sidecar).
4The host machine, mcptest runs in DockerHTTP/SSE URL on the hostYour CI image bundles mcptest and you point it at services on the host network.
5A Docker image, mcptest also in Dockerstdio via a nested docker runYou have no choice but to nest. Prefer Pattern 3.

The rest of the page expands each row.

Pattern 1: stdio subprocess via package runners

The most common shape for a Python or Node MCP server in 2026 is a package that exposes a console_scripts entry point. The runner downloads the package on first invocation, caches it, then forwards stdin and stdout. mcptest does not care which runner you use as long as the command speaks MCP over stdio.

uvx is the uv tool runner. It resolves the package against a local cache under ~/.cache/uv and runs the script in an ephemeral virtual environment. First invocation downloads the wheel; later runs reuse the cache.

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  fetch:
    command: ["uvx", "mcp-server-fetch"]

tools:
  - name: "fetch returns 200 for example.com"
    server: fetch
    tool: "fetch"
    args:
      url: "https://example.com"
    expect:
      - target: "result.content[0].text"
        matcher:
          contains: "Example Domain"

The first time a CI runner sees this YAML it must download the wheel, build any native dependencies, and bring up the process, which routinely takes 20 to 60 seconds. mcptest waits for the initialize handshake to finish; there is no per-server handshake-timeout knob, so the way to keep runs fast and predictable is to warm the cache in an earlier step that mcptest does not measure.

Do it in CI before invoking mcptest: a Setup uv step followed by uvx --no-progress mcp-server-fetch --help populates the cache so the metered run starts warm.

npx (Node, the broadest)

npx ships with every Node install. It downloads the package on demand from npm and runs the entry point.

servers:
  filesystem:
    command:
      - npx
      - -y
      - "@modelcontextprotocol/server-filesystem"
      - /tmp
    env:
      LOG_LEVEL: "info"

Two notes:

pipx (Python, isolated install)

pipx installs each package in its own venv under ~/.local/share/pipx. Unlike uvx, the install is persistent: install once, run many times.

servers:
  fetch:
    command: ["pipx", "run", "mcp-server-fetch"]

pipx run is the ephemeral form (closer to uvx). pipx install followed by invoking the entry point directly is the persistent form and is faster in CI if you cache ~/.local/share/pipx. Pick one and stick to it.

pip (Python, classic)

pip install --user mcp-server-fetch followed by calling the entry point directly is the lowest-common-denominator approach. Works on any machine with Python and pip, but you own the venv lifecycle.

servers:
  fetch:
    command: ["python", "-m", "mcp_server_fetch"]
    env:
      PYTHONUNBUFFERED: "1"

PYTHONUNBUFFERED=1 is worth setting. Without it, Python may buffer stdout and the MCP handshake will appear to hang.

cargo (Rust, source-built)

For Rust MCP servers you build yourself, cargo run works for local iteration. In CI, prefer building the binary once and pointing at the artifact.

servers:
  myserver:
    command: ["cargo", "run", "--release", "--quiet", "--", "--mode", "stdio"]

--quiet keeps cargo from leaking progress chatter into the stdio channel. -- separates cargo flags from server flags. The first build can take a while, so in CI build the binary once and point at the artifact rather than building inside the metered run.

Slow first runs

The first invocation pays for the package download and the server's own boot, which can run from 20 seconds to a couple of minutes depending on the runner. There is no handshake budget to raise; instead warm the cache in a step before the metered run.

RunnerWarmup step that mcptest does not measure
uvx, pipx runuvx --no-progress <pkg> --help
npx -ynpx -y <pkg> --help
cargocargo build --release, then point at the built binary
Dockerdocker pull <image>:<tag>

A warm cache makes the metered run fast and deterministic, which is what a CI gate should measure. A cold cache hidden behind a generous timeout just makes the run slow and the timing meaningless.

Pattern 2: Docker subprocess (stdio)

Many vendors ship their MCP server as a Docker image. mcptest runs them the same way it runs any other subprocess: docker run -i with stdin attached, talking MCP over stdio.

The minimal shape

servers:
  vendor:
    command:
      - docker
      - run
      - -i
      - --rm
      - "ghcr.io/example/mcp-vendor:1.4.2"

Five details matter.

-i keeps stdin open. Without it docker run closes the input stream and the server exits before mcptest can send initialize.

--rm deletes the container when the process exits. Without it, each test run leaves a dead container behind and docker system df grows.

No -t. Adding -t allocates a TTY, which interleaves stdout and stderr and corrupts the framed MCP stream. The MCP messages get mixed with ANSI escape codes and the parser fails.

Pin the tag. :1.4.2 is reproducible. :latest is not. CI that pulls :latest will pass on Tuesday and fail on Wednesday because the upstream image changed under you. Track upstream releases in a separate Renovate or Dependabot config and update the tag deliberately.

Pull policy. docker run pulls on first use, then caches. In CI with ephemeral runners, that pull happens every job. Either accept the per-job pull latency or add a docker pull step in the warmup that mcptest does not measure.

Env vars and secrets

Pass environment through with -e KEY (read from the runner's env) or -e KEY=value (literal). Read MCP-side secrets the same way you do for other servers: via mcptest's env interpolation, then forward to the container.

servers:
  vendor:
    command:
      - docker
      - run
      - -i
      - --rm
      - -e
      - VENDOR_API_KEY
      - -e
      - VENDOR_REGION=us-east-1
      - "ghcr.io/example/mcp-vendor:1.4.2"

VENDOR_API_KEY without a value tells Docker to read the variable from the parent environment, which is where mcptest's .env loader will have placed it. Never bake secrets into the command line literal: they end up in ps, in CI logs, and in docker inspect.

Volume mounts

For servers that read or write files, mount a host path read-only or read-write as appropriate.

servers:
  vendor:
    command:
      - docker
      - run
      - -i
      - --rm
      - -v
      - "${PWD}/fixtures:/work/fixtures:ro"
      - -v
      - "/tmp/mcptest-output:/work/out:rw"
      - "ghcr.io/example/mcp-vendor:1.4.2"

Two gotchas:

Network restrictions

If the server should not have network access, pass --network none. If it needs only loopback (to talk to a sidecar on the host), use --network host on Linux or --add-host host.docker.internal:host-gateway on Docker Desktop. The portable shape is to put both services in docker-compose (Pattern 3).

Pattern 3: docker-compose service (HTTP)

When the server has runtime dependencies (a database, a queue, an auth sidecar), the cleanest shape is a docker-compose.yml that brings the whole graph up, exposes the MCP server on a host port, and lets mcptest talk to it over HTTP or SSE.

The compose file

# docker-compose.test.yml
services:
  mcp:
    image: ghcr.io/example/mcp-vendor:1.4.2
    environment:
      VENDOR_DB_URL: "postgres://mcp:mcp@db:5432/mcp"
      VENDOR_REGION: "us-east-1"
    ports:
      - "8080:8080"
    depends_on:
      db:
        condition: service_healthy

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: mcp
      POSTGRES_PASSWORD: mcp
      POSTGRES_DB: mcp
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "mcp"]
      interval: 2s
      timeout: 2s
      retries: 30

depends_on with service_healthy keeps the MCP container from booting before Postgres accepts connections. Without it, the MCP server crash-loops on connect failures and the compose stack is racy.

The mcptest YAML

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  vendor:
    url: "http://127.0.0.1:8080"
    auth:
      bearer_token_env: "VENDOR_API_KEY"
    wait_for_ready: "http://127.0.0.1:8080/healthz"

tools:
  - name: "echoes the input"
    server: vendor
    tool: "echo"
    args:
      message: "hello"
    expect:
      - target: "result.content[0].text"
        matcher:
          exact: "hello"

wait_for_ready polls the health endpoint until it returns 2xx before mcptest sends initialize. Without it, mcptest sends initialize immediately and the first request races the boot. This is the single most common cause of flaky compose suites.

The flow

The full local cycle is three commands:

docker compose -f docker-compose.test.yml up -d --wait
mcptest run -c mcptest.yml
docker compose -f docker-compose.test.yml down -v

--wait blocks until every service's healthcheck reports healthy. -v on down removes the named volumes so the next run starts clean.

GitHub Actions worked example

The same flow as a workflow job. Note the explicit --wait and the always() cleanup so a failing test still tears the stack down.

name: mcptest

on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install mcptest
        run: |
          curl -fsSL https://download.mcptest.sh/install.sh | sh
          echo "$HOME/.local/bin" >> "$GITHUB_PATH"

      - name: Start services
        run: docker compose -f docker-compose.test.yml up -d --wait

      - name: Run mcptest
        env:
          VENDOR_API_KEY: ${{ secrets.VENDOR_API_KEY }}
        run: mcptest run -c mcptest.yml --reporter junit --output mcptest.xml

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: mcptest-report
          path: mcptest.xml

      - name: Stop services
        if: always()
        run: docker compose -f docker-compose.test.yml down -v

Three deliberate choices in the workflow:

The full CI integration story (matrix builds, caching uv/npm/cargo, re-running flaky tests, the mcptest-results reusable workflow) lives in docs/guides/ci-integration.md (in flight). When that page lands it will link back here for the compose-specific bits.

Pattern 4: mcptest runs in Docker

Some teams package mcptest itself in a Docker image: a single container that holds the binary, the YAML, and the language toolchains needed to run package-runner-based servers. This is convenient for self-hosted runners that lock down what can be installed on the host.

Sample Dockerfile

The repo ships an example at examples/dockerfile/. The short version: multi-stage build, stage one pulls the mcptest binary, stage two provides the runtime and copies the binary to /usr/local/bin/mcptest. See examples/dockerfile/README.md for build instructions.

A real soapbucket/mcptest:latest image is not yet published. Follow-up work to publish a multi-arch image is tracked under "publish soapbucket/mcptest:latest multi-arch image" (file the ticket if it does not already exist).

Talking to services on the host

When mcptest runs in a container and the MCP server runs on the host, the container needs to reach the host's loopback. Three options:

Host OSFlagUse case
Linux--network hostContainer shares the host's network namespace. mcptest can reach 127.0.0.1:8080 directly.
Docker Desktop (macOS, Windows)--add-host host.docker.internal:host-gatewayResolve host.docker.internal to the host's gateway IP.
Any--network <compose-network>Join the same user-defined bridge as the compose stack.

For the compose case, the cleanest shape is to put mcptest itself in the same compose file and reach services by name.

# docker-compose.test.yml (snippet)
services:
  mcp:
    image: ghcr.io/example/mcp-vendor:1.4.2
    # ...as before

  mcptest:
    image: ghcr.io/soapbucket/mcptest:1.0.0
    depends_on:
      mcp:
        condition: service_started
    volumes:
      - ./:/work:ro
    working_dir: /work
    command: ["run", "-c", "mcptest.yml"]

The mcptest container then refers to the MCP server by its compose service name:

# mcptest.yml
servers:
  vendor:
    url: "http://mcp:8080"
    wait_for_ready: "http://mcp:8080/healthz"

mcp resolves on the compose-internal DNS. No host port mapping needed, which is also more secure (the server is not exposed to anything outside the compose network).

Caveats

Pattern 5: Russian-doll Docker

The case where mcptest lives in a container and the server under test also lives in a container, reached via docker run from inside mcptest.

This works. It is also the most painful pattern of the five. Use it only when Pattern 3 is unavailable, for example when the server image refuses to expose an HTTP listener and only speaks stdio.

The shape

The mcptest container needs access to a Docker socket so it can spawn sibling containers on the host's daemon.

docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v "$PWD:/work:ro" \
  -w /work \
  ghcr.io/soapbucket/mcptest:1.0.0 \
  run -c mcptest.yml

Inside the YAML, the command: invokes docker run against that mounted socket.

servers:
  vendor:
    command:
      - docker
      - run
      - -i
      - --rm
      - "ghcr.io/example/mcp-vendor:1.4.2"

Why we discourage it

Five reasons in order of severity:

  1. Privilege. A container with the Docker socket mounted has root on the host. Any compromise of mcptest, or of the YAML you feed it, is a full host compromise. Pattern 3 keeps mcptest unprivileged.
  2. Layered timeouts. Init timeout, docker-pull time, and the inner server's boot all stack. Failures are hard to attribute.
  3. Cache duplication. The inner Docker pulls happen on the host daemon, not in the mcptest container. The mcptest container still needs language toolchains for non-Docker patterns; you end up with two caches.
  4. Filesystem confusion. Volume mounts in the inner docker run are evaluated by the host daemon, not by the mcptest container, so they reference host paths, not container paths. Beginners get this wrong every time.
  5. Permission noise. The Docker socket is typically owned by root or the docker group. CI runners that map host UIDs into containers have to thread that group through, which is platform-specific.

If you still need it (say, the vendor image is the only artifact and exposes only stdio), keep the test surface small, accept the slower runs, and revisit when the vendor ships an HTTP shape.

A safer middle ground

If the only barrier to Pattern 3 is that the vendor image speaks stdio, wrap it. A small docker-compose.test.yml can run the vendor image with an socat or websocat sidecar that bridges stdio to HTTP. The result looks like Pattern 3 from mcptest's perspective. This is more setup than nested Docker but the result is observable, debuggable, and unprivileged.

Kubernetes (deferred to a future release)

mcptest does not ship Kubernetes patterns in v1.0. The two shapes we expect to support are:

Both are workable today by hand, but we have not committed to a stable recipe. Follow the placeholder ticket "Kubernetes test patterns" (file under the mcptest project if it does not already exist) for the design work. Until then, treat Pattern 3 as the closest match: docker-compose locally and in CI gives you most of what you want from a Pod without the cluster.

Picking a pattern

Default to Pattern 1 (package runner over stdio) for any server that ships as a package. Move to Pattern 3 (docker-compose with HTTP) as soon as the server needs a database, a queue, or any other runtime dependency. Reach for Pattern 2 only when the vendor ships a Docker image without an HTTP option. Pattern 4 (mcptest in Docker) is useful for locked-down CI; Pattern 5 (Russian-doll Docker) is a last resort.

If you find yourself fighting one of these patterns, file an issue at https://github.com/soapbucket/mcptest/issues rather than working around it in YAML. The fix usually belongs in mcptest's process supervision (Pattern 1 or 2) or in this doc (Pattern 3 onward).