mcptest docs GitHub

Portable run evidence

A run report already knows almost everything a registry needs to trust a run: which server was tested, under which MCP spec version, against which corpus, at which commit, when, and how it scored. mcptest evidence aggregates those fields into one small, schema-stable artifact a registry can ingest, and pairs it with a verifier that rejects evidence which is stale, forged, or unsigned.

No registry (Glama, IndexMCP, MCP Scoreboard, Smithery, PulseMCP) publishes a schema to ingest today, so mcptest defines the minimal one and lets registries adopt it. This is deliberately not a public directory: it is an artifact plus a trust policy.

Emitting an artifact

evidence reads a mcptest run --format json report and writes the artifact:

mcptest run tests/ --format json --output run.json
mcptest evidence run.json --out evidence.json

Fold in a security scan's severity counts, and mark the run reproducible (the sbom --verify / SOURCE_DATE_EPOCH parity signal), with:

mcptest security tools-list.json --format json > security.json
mcptest evidence run.json --security security.json --reproducible --out evidence.json

The artifact maps one-to-one onto existing run metadata:

{
  "schema_version": "mcptest.dev/evidence/v1",
  "server_identity": [
    { "name": "weather", "transport": "stdio", "auth": "none" }
  ],
  "spec_version": "2025-03-26",
  "corpus_version": "sha256:beef...",
  "source": {
    "repo": "git@github.com:acme/weather.git",
    "branch": "main",
    "commit_sha": "1f2e..."
  },
  "generated_at": "2026-06-02T10:01:00Z",
  "grades": {
    "tests_total": 42,
    "tests_passed": 41,
    "tests_failed": 1,
    "security_severity_counts": { "high": 1 }
  },
  "reproducible": true,
  "unverifiable_origin": false
}

A run with no git commit (a private deployment, a one-off) sets unverifiable_origin: true. Such an artifact is accepted but flagged as unattested rather than rejected, so a private server can still publish a badge.

Signing

--sign attaches a detached Sigstore signature, reusing the same keyless cosign sign-blob path the release workflow uses (GitHub Actions OIDC). It writes evidence.json.sig and evidence.json.cert beside the artifact:

mcptest evidence run.json --out evidence.json --sign

Signing needs cosign on PATH; without it, --sign exits 2 rather than emitting an unsigned artifact that looks signed. The cryptographic signature is cosign's job (the same root of trust as the SLSA provenance the release publishes); the binary owns the artifact and the anti-gaming bindings below.

Verifying

evidence verify weighs three things and rejects on any of them:

# Reject evidence older than 30 days, require it to be signed.
mcptest evidence verify evidence.json --max-age 720h --require-signed

Exit code is 0 when accepted, 1 when rejected (with the reasons printed), 2 when a file cannot be read or parsed. Full cryptographic Sigstore verification (transparency-log inclusion, certificate identity) is cosign verify-blob's job against the .sig / .cert; evidence verify owns the freshness, ancestry, and signature-presence policy a registry enforces per ingest.

What is and is not here

The artifact carries cassette_hash and cost_profile as optional fields, populated when a run supplies them; the OSS command leaves them absent. The compliance tier and TDQS tool scores are likewise optional and filled when a run folded those checks in.