mcptest docs GitHub

Security vulnerability report

mcptest security runs the deterministic red-team catalog over a server's tool/prompt/resource surface and reports findings. Beyond the terminal (--format pretty), the machine-readable JSON (--format json), and SARIF for code scanning (--format sarif), two formats produce a vulnerability report you can hand to a reviewer:

mcptest security --snapshot tools.json --format html > security.html
mcptest security --snapshot tools.json --format md   >> "$GITHUB_STEP_SUMMARY"

OWASP LLM Top 10 coverage

Each check in the catalog declares external references, including OWASP LLM Top 10 identifiers (for example OWASP LLM01). The report builds a coverage table from those references: one row per OWASP category, showing whether the catalog addresses it, which SEC-NNN rules cover it, and how many findings fired in that category on this run.

The OWASP LLM Top 10 (2025) categories are: LLM01 Prompt Injection, LLM02 Sensitive Information Disclosure, LLM03 Supply Chain, LLM04 Data and Model Poisoning, LLM05 Improper Output Handling, LLM06 Excessive Agency, LLM07 System Prompt Leakage, LLM08 Vector and Embedding Weaknesses, LLM09 Misinformation, and LLM10 Unbounded Consumption.

The coverage table is built from the bundled per-definition catalog (the surface lane), which is the same catalog SARIF rule definitions come from. The relational lanes (namespace, integrity, toxic-flow, trust-propagation) also contribute findings to the report; their findings render without an OWASP tag when the firing rule is not in the per-definition catalog.

A category with no covering rule is a coverage gap worth a new probe. Read the table to see where the catalog is thin before trusting a clean run.

OWASP coverage scope

Not every OWASP category is addressable as a deterministic predicate over a static tool, prompt, or resource definition. The table below records which lane covers each category and, where a category is out of scope for a black-box definition check, why.

OWASP categoryCovered byNotes
LLM01 Prompt Injectionsurface (SEC-001)Imperative model-directed text in a description.
LLM02 Sensitive Information Disclosuresurface (SEC-003, SEC-008)Exfiltration directives and embedded secrets.
LLM03 Supply Chainintegrity laneRug-pull and schema-drift checks compare a current catalog against an approved baseline. Not a single-definition check, so it lives outside the per-definition surface catalog.
LLM04 Data and Model Poisoningout of scope (runtime)Poisoning is a runtime data-flow concern: it depends on what untrusted content actually reaches the model or a state-changing tool at call time. A static definition cannot show it.
LLM05 Improper Output Handlingout of scope (runtime)Whether tool output is encoded or validated before it is acted on is observable only when the call runs, not from the definition.
LLM06 Excessive Agencysurface (SEC-009), capability laneUnannotated destructive tools, plus the toxic-flow source-to-sink pairing check.
LLM07 System Prompt Leakagesurface (SEC-037)A prompt or resource that embeds a system-instruction block carrying a secret. Tool-description secrets are SEC-008.
LLM08 Vector and Embedding Weaknessesout of scopeEmbedding and retrieval behavior is not visible in a tool/prompt/resource definition; there is no static signal to match.
LLM09 Misinformationout of scope (runtime)Factuality and context-faithfulness are graded on actual answers, handled by the eval lane, not by a static surface check.
LLM10 Unbounded Consumptionsurface (SEC-036), probes laneA list-like tool that declares no bound parameter is flagged statically; unbounded-response behavior at run time is handled by the active probes.

The runtime categories (LLM04, LLM05, LLM09) and the embedding category (LLM08) are deliberately left without a surface check rather than approximated with a heuristic that would mostly produce false positives. Adding a noisy check that fires on ordinary servers would erode trust in the deterministic verdict, which the security engine keeps free of guesswork.