mcptest docs GitHub

Cache eligibility

mcptest can skip work that is provably deterministic by reading a prior result from a cache instead of re-running the test. Before the storage layer comes anywhere near a test, the engine asks one question: is this test even eligible to be cached? That question is the job of the eligibility rules engine in mcptest-core::cache::eligibility, and it is the only piece of the cache shipped today. The storage, cache key derivation, and HTTP integration land in follow-up work.

Run this example. examples/cache-eligibility.yml shows how cache:, effects:, and a server version pin decide what gets cached and what is always re-run.

mcptest run --config examples/cache-eligibility.yml

Why caching is per-test-type-defaulted

Each kind of test has a typical relationship with determinism:

The defaults exist so authors do not have to annotate every test. When the default is wrong for a particular test, the author overrides it with cache: always or cache: never.

Precedence

The engine evaluates inputs in a fixed order. Higher in the list wins:

  1. cache: never directive on the test.
  2. Server cacheScope: no-store on the response envelope (, SEP-2549). A server can opt out of caching regardless of the author's cache: always directive.
  3. Hard exclusions:

    • hooks: block declared on the test.
    • HTTP transport without an explicit server_version: pin.
    • effects: list contains external.
  4. cache: always directive on the test.
  5. The test-kind default.

Two things to call out:

Compliance has one carve-out: the HTTP-without-pin exclusion does not apply to it. Compliance probes ask the protocol whether it advertises a capability. The answer does not depend on which build of the server is at the other end, so the version pin is not load-bearing for compliance.

Server cache directives

The 2026-07-28 spec lets a server attach two cache hints to tools/list, resources/list, resources/read, and prompts/list responses:

The current release wires both fields into the eligibility engine:

Stores write a TTL via CacheStore::put_with_ttl(key, value, Some(deadline)). The trait's default put_with_ttl forwards to the bare put so backends that have not yet learned the expiry-on-read rule stay correct; the filesystem store overrides it to thread expires_at_unix onto the on-disk metadata sidecar. The pure rule lives in cache::store::is_expired(expires_at_unix, now_unix) so a future Redis or S3 backend shares one TTL contract.

The directive is read off the response value by mcptest_core::cache::parse_server_directive, which looks for the two fields at the top level first then falls back to the result wrapper. Unknown scopes fall back to "unset" rather than failing the parse so a typo on the server side does not brick a run.

How effects: and hooks affect eligibility

effects: is the structured way authors tell mcptest what a test touches. The accepted values are external, local, and filesystem. Only external excludes a test from caching. The other variants exist so authors can tag tests honestly without giving up the determinism win. The rationale: filesystem and local state are inside the test runner's blast radius, so the runner can guarantee a clean state between runs. An external API call cannot be replayed safely from the runner's perspective.

hooks: is the YAML escape hatch for arbitrary author code (shell commands, custom Rust extensions, custom matchers). Hooks are non-deterministic by definition, so any test that declares hooks is excluded.

When to use cache: always

Rarely. The default for tools tests already enables caching, so always is only useful when you want to force-cache a test the engine would otherwise default away from. The realistic case is debugging a flaky performance test: pin it to cache: always, run the rest of the suite quickly, and re-enable real measurement when you are ready.

cache: always does not silence the hard exclusions, so it is safe to leave on while you triage. If the test starts touching external state or grows hooks, the engine will refuse to cache it and the reporter will say why.

Reporter output

Every cache miss the engine reports surfaces the structured reason from EligibilityReason. The pretty reporter renders it as cache miss: <reason> (for example, cache miss: not cacheable (effect: external declared)); the JSON reporter emits the variant name plus the rendered string so downstream tooling can group misses.

This page documents the eligibility engine. The follow-up pieces, landing separately, are the cache storage layer, cache key derivation, and cache HTTP transport integration. Each builds on the eligibility contract described here.