mcptest docs GitHub

Code-mode testing

Code mode is the access pattern where the model does not call tools directly. It writes code that calls a typed tool API, and the code runs in a sandbox whose only egress is the bound tools (Anthropic programmatic tool calling, Cloudflare Code Mode). A server can behave differently under code mode, and a harmful action taken inside generated code is easy to miss if you only watch the direct tool-calling path.

mcptest tests code mode without embedding a JavaScript engine. The run is captured as a conversation trace whose tool calls carry a code-execution marker, and the harness asserts on the calls and results the code actually made. An action performed in generated code is still a tool call on the wire, so it stays observable. The live code execution sits behind the trace, so a code-mode run replays from a cassette without a model or a sandbox.

Run this example. examples/codemode.yml records a code-mode agent run and replays it from a cassette, asserting the call the generated code made and the shape of the result it received.

ANTHROPIC_API_KEY=... mcptest run --record --config examples/codemode.yml
mcptest run --config examples/codemode.yml

What the harness checks

Given a captured run and the tool catalog, the harness reports:

Cassette-replayable example

The fixtures under crates/mcptest-agent/tests/fixtures/codemode/ are a recorded code-mode run and its catalog. The run makes two code-execution calls; the second returns a string where the schema requires an integer. Replaying it shows the harness reporting a pure code-mode run with one schema-drift finding, with no model in the loop. That is the pattern for adding code-mode coverage: capture a run once, then assert on it deterministically.