Security test catalog
The bundled security checks mcptest runs against an MCP server. The framework that runs them (declarative checks, deterministic verdicts, separate from the LLM eval path) is settled, and the checks cover the MCP security taxonomy.
This catalog is Layer A, the deterministic baseline. The advisory LLM-judge detection (Layer B), the dynamic red-team (Layer C), and external scanner integrations (Layer D) are part of the multi-layer security testing design. Only Layer A decides the security grade; the other layers are advisory, per-model, or normalized third-party signals.
Every check is deterministic. No model decides a pass or fail. Each row lists the probe method (static inspection of definitions, an active protocol probe, or a diff against a pinned manifest), a severity, whether a black-box client can detect it (yes, partial, or no), and how it maps to the threat model and to the external catalogs (OWASP LLM Top 10, OWASP Agentic Top 10, the MCP specification security guidance, or a CWE).
Out of scope, and marked so on the scorecard rather than scored as a pass: anything that needs host or server-runtime visibility (sandbox escape, server or client internals, server-side WAF behavior). A separate, model-dependent signal ("does this model fall for an injection") belongs to the agent eval path, not here, because an LLM never decides a security verdict.
Check IDs follow the rule-ID standard and emit through the SARIF reporter.
Tool-surface static analysis
Static inspection of tool, prompt, and resource definitions: names, descriptions, schemas, and annotations. The tool description is executable context, so this is the highest-yield family.
| ID | Check | Method | Severity | Black-box | Maps to |
|---|---|---|---|---|---|
| SEC-001 description-injection | Imperative-to-model instructions in a description ("ignore previous", "before doing X also") | static | high | yes | tool poisoning; LLM01 |
| SEC-002 cross-tool-directive | A description that instructs the model to call or alter another tool | static | high | yes | shadowing, parasitic toolchain |
| SEC-003 exfiltration-directive | A description that tells the model to read files, env vars, or send data outward | static | high | yes | data exfiltration; LLM02 |
| SEC-004 encoded-payload | Base64, hex, or other encoded blobs embedded in a description or schema | static | medium | yes | tool poisoning; CWE-506 |
| SEC-005 hidden-unicode | Invisible, zero-width, or bidirectional unicode in names or descriptions | static | high | yes | tool poisoning; CWE-176 |
| SEC-006 preference-manipulation | Persuasive "always use this tool" language that biases selection | static | medium | yes | preference manipulation |
| SEC-007 docstring-schema-mismatch | A parameter named in the description that the input schema does not declare | static | medium | yes | behavioral mismatch |
| SEC-008 secret-in-definition | API keys, tokens, or PII in a description or example | static | high | yes | sensitive disclosure; LLM02 |
| SEC-009 unannotated-destructive-tool | A write or delete tool with no destructive-action annotation | static | medium | partial | excessive agency; LLM06 |
| SEC-036 unbounded-list-tool | A list/search/query/fetch tool whose input schema declares no bound parameter (limit, max, count, page_size, top_k, page, offset, cursor, per_page) | static | low | partial | unbounded consumption; LLM10; CWE-770 |
| SEC-037 system-prompt-leakage | A prompt or resource that embeds a system-instruction block carrying a secret | static | high | partial | system prompt leakage; LLM07 |
Definition integrity and drift
Diff the current definitions against a pinned manifest. This is the rug-pull and drift defense, the same hash-and-version approach mcp-scan calls tool pinning, reusing the cassette manifest.
| ID | Check | Method | Severity | Black-box | Maps to |
|---|---|---|---|---|---|
| SEC-014 tool-pinning-diff | A previously approved tool's description changed between the baseline and the current catalog | diff | high | yes | rug pull |
| SEC-015 schema-drift | A previously approved tool's input schema, output schema, or annotations changed | diff | medium | yes | configuration drift |
| SEC-016 version-stamp-posture | Whether the server version-stamps its definitions (ETDI-style) | static | info | partial | integrity posture signal |
Namespace and supply chain
Static analysis across a multi-server config.
| ID | Check | Method | Severity | Black-box | Maps to |
|---|---|---|---|---|---|
| SEC-010 duplicate-tool-name | The same tool name served by more than one server | static | high | yes | tool shadowing |
| SEC-011 tool-name-squat | A tool name that is a near-duplicate or typosquat of another server's tool | static | medium | partial | tool-name squatting |
| SEC-012 server-name-squat | A server name that is a near-duplicate of another server name | static | medium | partial | server-name squatting |
| SEC-013 ambiguous-resolution | The same tool name with different input schemas across servers | static | medium | yes | multi-server trust; LLM03 |
Toxic flow and capability
Static analysis that scores a catalog's latent capability before any payload fires. Each tool is classified into zero or more capability tiers from keyword heuristics over its name, description, and input-schema property names (a conservative match, so detectability is partial). The risk is the pairing: an untrusted-content source plus an exfil-or-destructive sink is a complete exfiltration chain a prompt injection can wire together. SEC-003 and SEC-009 catch the directive half; these score the structural half.
The five tiers are sensitive-data-exposure (email, credential, vault, secret), workspace-data-exposure (file, path, repo, code), destructive (delete, drop, exec, transfer, pay), local-destructive (rm, unlink, format, local path), and untrusted-content-source (fetch, http, url, web, third-party). A sink is any of the three destructive or sensitive-exposure tiers.
| ID | Check | Method | Severity | Black-box | Maps to |
|---|---|---|---|---|---|
| SEC-033 capability-tier | Classify each tool into latent capability tiers (informational posture) | static | info | partial | toxic flow; excessive agency; LLM06 |
| SEC-034 untrusted-content-source | A tool that pulls untrusted external content, the injection entry point | static | low | partial | toxic flow; LLM01 |
| SEC-035 toxic-flow-pairing | An untrusted-content source coexists with an exfil-or-destructive sink | static | high | partial | toxic flow; data exfiltration; LLM02 |
Transport and protocol
Active probes against the running server.
Implemented probes carry a SEC id and analyze captured [ProbeEvidence]; the live probing sits behind that evidence struct so a run is deterministic and cassette-replayable. The first batch (SEC-017..023) landed and the follow-up batch (SEC-024..032); the catalog is now fully implemented.
| ID | Check | Method | Severity | Black-box | Maps to |
|---|---|---|---|---|---|
| SEC-017 transport/tls-required | Server accepts plaintext HTTP on a non-loopback address | active | high | yes | MCP spec (HTTPS) |
| SEC-018 transport/origin-validation | Server does not validate the Origin header | active | high | partial | DNS rebinding; MCP spec |
| SEC-019 transport/private-ip-guard | OAuth or discovery URLs resolve to private or cloud-metadata ranges | active | high | partial | SSRF (server-side request forgery); MCP spec (block private IPs) |
| SEC-020 transport/error-envelope-leak | Error responses leak stack traces, internal paths, or secrets | active | medium | yes | sensitive disclosure; CWE-209 |
| SEC-024 transport/rate-limit-present | No rate limiting under a controlled burst | active | low | partial | unbounded consumption; LLM10 |
| SEC-025 transport/capability-attestation | Server advertises capabilities without attestation | active | low | partial | Breaking the Protocol |
| SEC-026 transport/sampling-origin-auth | Server uses sampling without authenticating the origin | active | medium | partial | Breaking the Protocol |
| SEC-027 transport/unbounded-response | A single response can grow without a server-side bound | active | low | yes | unbounded consumption; LLM10 |
Auth and identity posture
Active probes of the authentication and authorization surface, grounded in the MCP specification's MUST and SHOULD requirements.
| ID | Check | Method | Severity | Black-box | Maps to |
|---|---|---|---|---|---|
| SEC-021 auth/posture-tier | Report the auth tier (none, header-bearer, OAuth) | active | info | yes | posture signal |
| SEC-022 auth/token-audience | Server accepts a token not issued for it (token passthrough) | active | high | partial | MCP spec (no passthrough); RFC 8707 |
| SEC-028 auth/resource-indicators | Server advertises and honors RFC 8707 resource indicators | active | medium | partial | RFC 8707 |
| SEC-029 auth/scope-minimization | Server publishes wildcard or omnibus scopes | active | medium | partial | MCP spec (scope minimization) |
| SEC-030 auth/session-id-hygiene | Predictable session IDs, or sessions used for authentication | active | high | partial | MCP spec (session hygiene) |
| SEC-031 auth/confused-deputy-posture | Proxy server lacks per-client consent or exact redirect_uri matching | active | medium | partial | MCP spec (confused deputy) |
Local server and supply chain
Static analysis of stdio and local-server configuration.
| ID | Check | Method | Severity | Black-box | Maps to |
|---|---|---|---|---|---|
| SEC-023 local/dangerous-startup-command | A startup command with a dangerous pattern (sudo, rm -rf, piped curl, SSH-key access) | static | critical | yes | MCP spec (local compromise); LLM03 |
| SEC-032 local/install-source-provenance | Install source has no provenance or pinned version | static | low | partial | supply chain; LLM03 |
OWASP MCP Top 10 cross-walk
In addition to the per-row mappings above (OWASP LLM and Agentic Top 10, MCP spec requirements, CWEs), the catalog cross-walks to the official OWASP MCP Top 10, the most on-point external anchor. That list is in beta, so the mapping covers its published items.
| OWASP MCP Top 10 | Covered by |
|---|---|
| MCP01 Token mismanagement and secret exposure | surface/secret-in-definition, auth/token-audience |
| MCP02 Privilege escalation | surface/unannotated-destructive-tool, auth/scope-minimization |
| MCP03 Tool poisoning | the surface/ family, SEC-014 tool-pinning-diff |
| MCP04 Supply chain and dependency tampering | the namespace family (SEC-010 through SEC-013), local/install-source-provenance |
| MCP05 Command injection | local/dangerous-startup-command, surface/exfiltration-directive |
| MCP07 Insufficient authentication | the auth/ family |
| MCP09 Shadow MCP servers | SEC-010 duplicate-tool-name, SEC-012 server-name-squat |
| MCP10 Context over-sharing | surface/exfiltration-directive, transport/error-envelope-leak, SEC-035 toxic-flow-pairing |
Coverage and follow-ups
This catalog is the bundled pack for the deterministic security framework. The static lanes are wired into the mcptest security CLI: the surface family (SEC-001..009), the namespace family (SEC-010..013), the integrity family (SEC-014..016, behind --baseline), and the toxic-flow family (SEC-033..035). The advisory LLM-judge lane runs behind --model and is reported separately so it never moves the verdict. The active probe rows (transport, auth, local SEC-017..032) and the per-model dynamic red-team layers (C1/C2) remain deferred until a live red-team command is wired: they need a running server and a captured trace, which the static-snapshot CLI does not supply. Coverage against the MCP-DPT inventory is reported as "N of the testable rows," and the scorecard security section renders each fired check as a line item with its severity and defense-layer tag.