Running the security checks

The deterministic security engine runs a set of checks over a tools/list-style snapshot of a server's tool, prompt, and resource definitions and reports findings. No model is in the verdict path: every check is a deterministic predicate, so a finding is reproducible and cannot be talked out of by a server that narrates safety.

mcptest security runs every deterministic lane whose inputs are present. The tool-surface and toxic-flow lanes always run; the cross-server namespace lane runs when the snapshot carries a servers array; the baseline-vs-current integrity lane runs when you pass --baseline. All of these produce findings that count toward the --fail-on gate. Each check has a stable SEC-NNN rule ID and a severity on the critical/high/medium/low/info axis.

An opt-in advisory lane (--model) runs an LLM judge for the semantic threats a regex cannot see. It is advisory only: its findings are reported in a separate section and never change the verdict or the exit code.

The surface checks

ID	Name	Severity	Fires when
SEC-001	description-injection	high	a description carries an imperative aimed at the model ("ignore all previous", "before answering...")
SEC-002	cross-tool-directive	high	a description steers the model to call or alter another tool
SEC-003	exfiltration-directive	high	a description points the model at sensitive data or an external sink
SEC-004	encoded-payload	medium	a long base64 or hex blob is embedded in a description
SEC-005	hidden-unicode	high	a name or description contains invisible or bidirectional characters
SEC-006	preference-manipulation	medium	a description uses persuasive language to capture tool selection
SEC-007	docstring-schema-mismatch	medium	the description names a parameter the input schema does not declare
SEC-008	secret-in-definition	high	a description appears to contain a key or credential
SEC-009	unannotated-destructive-tool	medium	a tool implies a destructive action but declares no destructiveHint

The namespace checks

The namespace family is relational: each check compares servers against each other rather than inspecting one definition, so these checks run over a multi-server snapshot rather than the per-server tools/list the surface checks take. The snapshot is a list of servers, each with a name and its tools array, and the engine runs them through namespace_findings. The example at examples/security-multi-server.json serves the same tool name from two servers with different input schemas, so a run reports both a duplicate-name and an ambiguous-resolution finding.

ID	Name	Severity	Fires when
SEC-010	duplicate-tool-name	high	the same tool name is served by more than one server
SEC-011	tool-name-squat	medium	a tool name is a near-duplicate of another server's tool (edit distance 1, or separator/case only)
SEC-012	server-name-squat	medium	a server name is a near-duplicate of another server name
SEC-013	ambiguous-resolution	medium	the same tool name has different input schemas across servers

The integrity checks

The integrity family is relational across time: it compares the current catalog against a baseline the user already approved, so it catches a server that swaps a definition out from under an approval (a rug pull). These checks take two tools/list snapshots, a baseline and a current, and run through integrity_findings. The before/after pair at examples/security-integrity-baseline.json and examples/security-integrity-current.json changes one tool's description and adds a required field to another tool's input schema, so a run reports a tool-pinning-diff and a schema-drift finding. Neither snapshot version-stamps its tools, so the run also reports the posture finding.

SEC-014 and SEC-015 reuse the mcptest diff engine to compute the catalog delta, then classify each change. A change to a tool that exists in both snapshots is a pinning or drift finding; an added or removed tool is neither, because a tool the user never approved cannot be a rug pull. SEC-016 reads the current catalog alone and notes when no tool carries a version stamp (a top-level version field or a _meta version), since without one a later definition swap leaves no trace in the catalog itself.

ID	Name	Severity	Fires when
SEC-014	tool-pinning-diff	high	an approved tool's description changed between the baseline and the current catalog
SEC-015	schema-drift	medium	an approved tool's input schema, output schema, or annotations changed
SEC-016	version-stamp-posture	info	no tool in the current catalog carries a version stamp, so ETDI-style integrity is absent

The toxic-flow checks

The toxic-flow family scores a catalog's latent capability before any payload fires. It classifies each tool into capability tiers from its name, description, and input-schema property names, then flags the dangerous pairing: a tool that pulls untrusted content and a tool that can exfiltrate or destroy state coexist in the same catalog, so a prompt injection delivered through the first can drive the second. This lane reads only the tools array and always runs, so a single-server snapshot still gets it.

ID	Name	Severity	Fires when
SEC-033	capability-tier	info	a tool classifies into one or more latent capability tiers (an informational tag, not a defect)
SEC-034	untrusted-content-source	low	a tool pulls untrusted external content, the injection entry point
SEC-035	toxic-flow-pairing	high	an untrusted-content source and an exfil-or-destructive sink coexist (medium when the sink is local-only destructive)

The posture coverage map

Beside the findings, a run prints a posture coverage map: a set of signals tagged with the MCP-DPT defense layer they speak to. The map is a coverage report, not a verdict, so it never gates. It is honest about what a black-box client cannot see: the transport, rate-limit, and error-hygiene signals report "requires an active probe" rather than a silent pass when their evidence was not captured, and the whole-server controls a black-box client can never observe (server-side WAF, payload inspection, runtime intent verification) are listed explicitly so an absent signal is not mistaken for a clean one.

The advisory lane (`--model`)

With --model <id>, a run also asks an LLM judge to look for the semantic threats a deterministic matcher cannot: subtle tool poisoning, a description that claims behavior the schema does not support, and persuasive language that over-prefers a tool. The provider is resolved from the environment the same way agent and llm-judge runs resolve it (set the matching key, for example ANTHROPIC_API_KEY for a claude-* model). The model is a mechanism here, not a judge of the grade.

The advisory boundary is load-bearing: the advisory findings live in their own section, carry a confidence band, and never count toward --fail-on or the exit code. A judge outage or a missing key degrades the run to fewer advisory findings, never to a changed verdict. The advisory section appears in the pretty and JSON output; SARIF stays verdict-only, because code scanning is for the deterministic findings.

The live red-team lane (`security redteam`)

mcptest security redteam drives the Layer C1 exploitability corpus against a running server. Where the snapshot scan reads a static catalog, the red-team lane connects to a live server, runs each scenario through the agent loop with the model under test, and scores the captured trace with the same observable-evidence oracle. It answers the follow-up question a static flag leaves open: does this model actually fall for the attack?

# Drive the red-team corpus against a live server with one model under test.
ANTHROPIC_API_KEY=... mcptest security redteam --url https://localhost:8080/mcp --model claude-sonnet-4-5

# Machine-readable per-model report for the model-compatibility view.
mcptest security redteam --url https://localhost:8080/mcp --model claude-sonnet-4-5 --format json

# Add a custom header to the connection (Authorization is rejected on the flag).
mcptest security redteam --url https://localhost:8080/mcp --model gpt-5 --header X-Tenant=acme

The output is a per-model exploitability signal, not a security grade. A weakness one model resists and another falls for is a property of the model, so the lane never folds into the server's verdict. The exit code reflects whether the run completed (0), not whether the model was exploitable; 2 reports a setup failure (the model provider could not be resolved, or the connection failed). The model is the target, not a judge: its API key comes from the environment and is never logged.

See red-team exploitability for the lane's design and the observable-evidence oracle it scores with.

Input and output

The engine takes a JSON snapshot with optional tools, prompts, and resources arrays. The example at examples/security-tools-list.json carries one clean tool and several poisoned ones, so a run produces findings from most of the surface checks.

Findings render three ways: a human-readable summary, JSON for the scorecard and CI, and SARIF 2.1.0 for code scanning (rule IDs, levels mapped from severity, and a help URI per rule). The SARIF output drops into GitHub or GitLab code scanning the same way the compliance and lint results do.

At the command line

mcptest security scans a snapshot and reports the findings:

# Human-readable summary (surface + toxic-flow lanes, plus posture map).
mcptest security tools-list.json

# Fail a CI job on any high or critical finding (high is the default floor).
mcptest security tools-list.json --fail-on high

# Diff against an approved baseline to run the integrity lane.
mcptest security tools-list.json --baseline approved-manifest.json

# A snapshot with a `servers` array runs the cross-server namespace lane.
mcptest security multi-server.json

# Add the advisory LLM-judge lane (reported separately, never gates).
ANTHROPIC_API_KEY=... mcptest security tools-list.json --model claude-sonnet-4-5

# SARIF for code scanning, or JSON for the scorecard.
mcptest security tools-list.json --format sarif > security.sarif
mcptest security tools-list.json --format json

Exit codes: 0 when no deterministic finding fires at or above --fail-on, 1 when one does, and 2 when an input cannot be read or parsed (or, with --model, when the advisory provider cannot be resolved). The advisory lane never moves the exit code.

The same lanes are a library in mcptest-core (mcptest_core::security). The run_deterministic and run_with_advisory orchestrators assemble a SecurityRun whose fails_at consults only the deterministic report; the advisory report has no gating method. The lanes are also callable directly: SecurityEngine::with_bundled_checks() for the surface lane, toxic_flow_findings for capability analysis, namespace_findings over a multi-server snapshot, integrity_findings over a baseline-and-current pair, assess_posture for the coverage map, and advisory_findings with an injected judge closure. The fixed-corpus Layer C1 exploitability lane is now wired into security redteam and callable directly through assess_scenario over a captured trace. The active transport, protocol, and auth probes and the adaptive attacker (Layer C2) remain doc-hidden until they are wired, tracked under the security-framework epic .