Running the security checks
The deterministic security engine runs a set of checks over a tools/list-style snapshot of a server's tool, prompt, and resource definitions and reports findings. No model is in the verdict path: every check is a deterministic predicate, so a finding is reproducible and cannot be talked out of by a server that narrates safety.
mcptest security runs every deterministic lane whose inputs are present. The tool-surface and toxic-flow lanes always run; the cross-server namespace lane runs when the snapshot carries a servers array; the baseline-vs-current integrity lane runs when you pass --baseline. All of these produce findings that count toward the --fail-on gate. Each check has a stable SEC-NNN rule ID and a severity on the critical/high/medium/low/info axis.
An opt-in advisory lane (--model) runs an LLM judge for the semantic threats a regex cannot see. It is advisory only: its findings are reported in a separate section and never change the verdict or the exit code.
The surface checks
| ID | Name | Severity | Fires when |
|---|---|---|---|
| SEC-001 | description-injection | high | a description carries an imperative aimed at the model ("ignore all previous", "before answering...") |
| SEC-002 | cross-tool-directive | high | a description steers the model to call or alter another tool |
| SEC-003 | exfiltration-directive | high | a description points the model at sensitive data or an external sink |
| SEC-004 | encoded-payload | medium | a long base64 or hex blob is embedded in a description |
| SEC-005 | hidden-unicode | high | a name or description contains invisible or bidirectional characters |
| SEC-006 | preference-manipulation | medium | a description uses persuasive language to capture tool selection |
| SEC-007 | docstring-schema-mismatch | medium | the description names a parameter the input schema does not declare |
| SEC-008 | secret-in-definition | high | a description appears to contain a key or credential |
| SEC-009 | unannotated-destructive-tool | medium | a tool implies a destructive action but declares no destructiveHint |
The namespace checks
The namespace family is relational: each check compares servers against each other rather than inspecting one definition, so these checks run over a multi-server snapshot rather than the per-server tools/list the surface checks take. The snapshot is a list of servers, each with a name and its tools array, and the engine runs them through namespace_findings. The example at examples/security-multi-server.json serves the same tool name from two servers with different input schemas, so a run reports both a duplicate-name and an ambiguous-resolution finding.
| ID | Name | Severity | Fires when |
|---|---|---|---|
| SEC-010 | duplicate-tool-name | high | the same tool name is served by more than one server |
| SEC-011 | tool-name-squat | medium | a tool name is a near-duplicate of another server's tool (edit distance 1, or separator/case only) |
| SEC-012 | server-name-squat | medium | a server name is a near-duplicate of another server name |
| SEC-013 | ambiguous-resolution | medium | the same tool name has different input schemas across servers |
The integrity checks
The integrity family is relational across time: it compares the current catalog against a baseline the user already approved, so it catches a server that swaps a definition out from under an approval (a rug pull). These checks take two tools/list snapshots, a baseline and a current, and run through integrity_findings. The before/after pair at examples/security-integrity-baseline.json and examples/security-integrity-current.json changes one tool's description and adds a required field to another tool's input schema, so a run reports a tool-pinning-diff and a schema-drift finding. Neither snapshot version-stamps its tools, so the run also reports the posture finding.
SEC-014 and SEC-015 reuse the mcptest diff engine to compute the catalog delta, then classify each change. A change to a tool that exists in both snapshots is a pinning or drift finding; an added or removed tool is neither, because a tool the user never approved cannot be a rug pull. SEC-016 reads the current catalog alone and notes when no tool carries a version stamp (a top-level version field or a _meta version), since without one a later definition swap leaves no trace in the catalog itself.
| ID | Name | Severity | Fires when |
|---|---|---|---|
| SEC-014 | tool-pinning-diff | high | an approved tool's description changed between the baseline and the current catalog |
| SEC-015 | schema-drift | medium | an approved tool's input schema, output schema, or annotations changed |
| SEC-016 | version-stamp-posture | info | no tool in the current catalog carries a version stamp, so ETDI-style integrity is absent |
The toxic-flow checks
The toxic-flow family scores a catalog's latent capability before any payload fires. It classifies each tool into capability tiers from its name, description, and input-schema property names, then flags the dangerous pairing: a tool that pulls untrusted content and a tool that can exfiltrate or destroy state coexist in the same catalog, so a prompt injection delivered through the first can drive the second. This lane reads only the tools array and always runs, so a single-server snapshot still gets it.
| ID | Name | Severity | Fires when |
|---|---|---|---|
| SEC-033 | capability-tier | info | a tool classifies into one or more latent capability tiers (an informational tag, not a defect) |
| SEC-034 | untrusted-content-source | low | a tool pulls untrusted external content, the injection entry point |
| SEC-035 | toxic-flow-pairing | high | an untrusted-content source and an exfil-or-destructive sink coexist (medium when the sink is local-only destructive) |
The posture coverage map
Beside the findings, a run prints a posture coverage map: a set of signals tagged with the MCP-DPT defense layer they speak to. The map is a coverage report, not a verdict, so it never gates. It is honest about what a black-box client cannot see: the transport, rate-limit, and error-hygiene signals report "requires an active probe" rather than a silent pass when their evidence was not captured, and the whole-server controls a black-box client can never observe (server-side WAF, payload inspection, runtime intent verification) are listed explicitly so an absent signal is not mistaken for a clean one.
The advisory lane (--model)
With --model <id>, a run also asks an LLM judge to look for the semantic threats a deterministic matcher cannot: subtle tool poisoning, a description that claims behavior the schema does not support, and persuasive language that over-prefers a tool. The provider is resolved from the environment the same way agent and llm-judge runs resolve it (set the matching key, for example ANTHROPIC_API_KEY for a claude-* model). The model is a mechanism here, not a judge of the grade.
The advisory boundary is load-bearing: the advisory findings live in their own section, carry a confidence band, and never count toward --fail-on or the exit code. A judge outage or a missing key degrades the run to fewer advisory findings, never to a changed verdict. The advisory section appears in the pretty and JSON output; SARIF stays verdict-only, because code scanning is for the deterministic findings.
The live red-team lane (security redteam)
mcptest security redteam drives the Layer C1 exploitability corpus against a running server. Where the snapshot scan reads a static catalog, the red-team lane connects to a live server, runs each scenario through the agent loop with the model under test, and scores the captured trace with the same observable-evidence oracle. It answers the follow-up question a static flag leaves open: does this model actually fall for the attack?
# Drive the red-team corpus against a live server with one model under test.
ANTHROPIC_API_KEY=... mcptest security redteam --url https://localhost:8080/mcp --model claude-sonnet-4-5
# Machine-readable per-model report for the model-compatibility view.
mcptest security redteam --url https://localhost:8080/mcp --model claude-sonnet-4-5 --format json
# Add a custom header to the connection (Authorization is rejected on the flag).
mcptest security redteam --url https://localhost:8080/mcp --model gpt-5 --header X-Tenant=acme
The output is a per-model exploitability signal, not a security grade. A weakness one model resists and another falls for is a property of the model, so the lane never folds into the server's verdict. The exit code reflects whether the run completed (0), not whether the model was exploitable; 2 reports a setup failure (the model provider could not be resolved, or the connection failed). The model is the target, not a judge: its API key comes from the environment and is never logged.
See red-team exploitability for the lane's design and the observable-evidence oracle it scores with.
Input and output
The engine takes a JSON snapshot with optional tools, prompts, and resources arrays. The example at examples/security-tools-list.json carries one clean tool and several poisoned ones, so a run produces findings from most of the surface checks.
Findings render three ways: a human-readable summary, JSON for the scorecard and CI, and SARIF 2.1.0 for code scanning (rule IDs, levels mapped from severity, and a help URI per rule). The SARIF output drops into GitHub or GitLab code scanning the same way the compliance and lint results do.
At the command line
mcptest security scans a snapshot and reports the findings:
# Human-readable summary (surface + toxic-flow lanes, plus posture map).
mcptest security tools-list.json
# Fail a CI job on any high or critical finding (high is the default floor).
mcptest security tools-list.json --fail-on high
# Diff against an approved baseline to run the integrity lane.
mcptest security tools-list.json --baseline approved-manifest.json
# A snapshot with a `servers` array runs the cross-server namespace lane.
mcptest security multi-server.json
# Add the advisory LLM-judge lane (reported separately, never gates).
ANTHROPIC_API_KEY=... mcptest security tools-list.json --model claude-sonnet-4-5
# SARIF for code scanning, or JSON for the scorecard.
mcptest security tools-list.json --format sarif > security.sarif
mcptest security tools-list.json --format json
Exit codes: 0 when no deterministic finding fires at or above --fail-on, 1 when one does, and 2 when an input cannot be read or parsed (or, with --model, when the advisory provider cannot be resolved). The advisory lane never moves the exit code.
The same lanes are a library in mcptest-core (mcptest_core::security). The run_deterministic and run_with_advisory orchestrators assemble a SecurityRun whose fails_at consults only the deterministic report; the advisory report has no gating method. The lanes are also callable directly: SecurityEngine::with_bundled_checks() for the surface lane, toxic_flow_findings for capability analysis, namespace_findings over a multi-server snapshot, integrity_findings over a baseline-and-current pair, assess_posture for the coverage map, and advisory_findings with an injected judge closure. The fixed-corpus Layer C1 exploitability lane is now wired into security redteam and callable directly through assess_scenario over a captured trace. The active transport, protocol, and auth probes and the adaptive attacker (Layer C2) remain doc-hidden until they are wired, tracked under the security-framework epic .