Transport, auth, and local probes
The static engine reads a server's catalog. Some risks only show up when you poke the running server: does it accept plaintext, validate the Origin header, block private-IP URLs, leak in its error envelopes, require auth, rate limit a burst, bound its responses, or launch a dangerous stdio command. These are the probe checks.
Run this example. examples/security-multi-server.json captures more than one server, so a single scan exercises the probe checks across each transport.
mcptest security examples/security-multi-server.json
How they work
A probe needs active behavior, but the analysis stays deterministic. The split is the same one the red-team and advisory layers use: a caller captures what the server did into a ProbeEvidence value, and the analyzers read that evidence. The live probing (making the requests, launching the connection) sits behind the evidence struct, so the analysis is reproducible and a probe run replays from a cassette without touching the network.
Every evidence field is optional. A probe whose evidence was not captured produces no finding rather than guessing, which keeps a "did not measure" honest instead of scoring it as a pass.
Implemented probes
| Rule | What it flags |
|---|---|
| SEC-017 tls-required | Plaintext HTTP accepted on a non-loopback address. |
| SEC-018 origin-validation | A request with a foreign Origin was accepted (DNS rebinding). |
| SEC-019 private-ip-guard | An advertised URL is an IP literal in a private or cloud-metadata range (SSRF). |
| SEC-020 error-envelope-leak | An error body carries a stack trace or an obvious secret (CWE-209). |
| SEC-021 posture-tier | Reports the auth tier (none, header bearer, OAuth). Informational. |
| SEC-022 token-audience | The server accepted a token not issued for it (token passthrough; RFC 8707). |
| SEC-023 dangerous-startup-command | A stdio launch command runs a dangerous shell pattern (piped curl, rm -rf, sudo, eval). |
| SEC-024 rate-limit-present | A controlled burst saw no rate limiting (unbounded consumption). |
| SEC-025 capability-attestation | Capabilities were advertised without an attestation. |
| SEC-026 sampling-origin-auth | Sampling was used without authenticating the origin. |
| SEC-027 unbounded-response | A response had no declared size bound, or exceeded the declared one. |
| SEC-028 resource-indicators | The server does not advertise or honor RFC 8707 resource indicators. |
| SEC-029 scope-minimization | An advertised scope is a wildcard or an omnibus (*, admin, all). |
| SEC-030 session-id-hygiene | A session ID is low entropy or sequential, or sessions are used for auth. |
| SEC-031 confused-deputy-posture | A proxy lacks per-client consent or exact redirect_uri matching. |
| SEC-032 install-source-provenance | An install source has no version pin or provenance. |
What stays partial
Several of these probes are marked partial in the catalog because a black-box client cannot always see the full picture, so they fire only on clear evidence and stay silent on the ambiguous case. private-ip-guard only flags IP literals; a hostname would need resolution the client cannot do deterministically, so a hostname is not flagged. The fuzzy analyzers carry a documented heuristic: scope-minimization flags a * wildcard or a known omnibus scope name, session-id-hygiene flags a session ID shorter than 16 characters, an all-digit counter, a single repeated character, or a consecutive numeric sequence, and install-source-provenance flags a source with no version pin (@<version>, ==, a #-ref, or a sha256: digest) and no @latest override. Every catalog probe now has an implementation; nothing remains as a follow-up.
These probes are deterministic security findings: unlike the red-team and advisory layers, they do count toward the grade, because a server that serves plaintext or passes tokens through is a defect in the server, not a property of a model.