Cache eligibility

mcptest can skip work that is provably deterministic by reading a prior result from a cache instead of re-running the test. Before the storage layer comes anywhere near a test, the engine asks one question: is this test even eligible to be cached? That question is the job of the eligibility rules engine in mcptest-core::cache::eligibility, and it is the only piece of the cache shipped today. The storage, cache key derivation, and HTTP integration land in follow-up work.

Run this example. examples/cache-eligibility.yml shows how cache:, effects:, and a server version pin decide what gets cached and what is always re-run.

mcptest run --config examples/cache-eligibility.yml

Why caching is per-test-type-defaulted

Each kind of test has a typical relationship with determinism:

Tools tests are usually a fixed input mapped to a fixed output. Cacheable by default.
Compliance tests are read-only protocol probes (initialize, tools/list, error shape). Cacheable by default.
Eval tests route through an LLM with sampling. The same input produces different scores across runs. Never cacheable by default.
Performance tests measure wall-clock latency. Caching a timing measurement would be a lie. Never cacheable by default.
Model-compatibility tests mirror tools-style behavior across models, so they share the tools default.

The defaults exist so authors do not have to annotate every test. When the default is wrong for a particular test, the author overrides it with cache: always or cache: never.

Precedence

The engine evaluates inputs in a fixed order. Higher in the list wins:

cache: never directive on the test.
Server cacheScope: no-store on the response envelope (, SEP-2549). A server can opt out of caching regardless of the author's cache: always directive.
Hard exclusions:
- hooks: block declared on the test.
- HTTP transport without an explicit server_version: pin.
- effects: list contains external.
cache: always directive on the test.
The test-kind default.

Two things to call out:

cache: never is sovereign. It is the only directive that beats every hard exclusion. Authors use it when they want a single test pinned-uncacheable even after a future refactor changes the engine's view of the test.
cache: always cannot override hard exclusions. Forcing the cache on a hook-driven test or a test with effects: [external] would let the cache silently return a stale answer, which is exactly the failure mode the exclusions exist to prevent.

Compliance has one carve-out: the HTTP-without-pin exclusion does not apply to it. Compliance probes ask the protocol whether it advertises a capability. The answer does not depend on which build of the server is at the other end, so the version pin is not load-bearing for compliance.

Server cache directives

The 2026-07-28 spec lets a server attach two cache hints to tools/list, resources/list, resources/read, and prompts/list responses:

ttlMs (number): how long the client may cache the result, in milliseconds. Modeled on HTTP Cache-Control: max-age.
cacheScope (string): one of public, private, or no-store. Modeled on HTTP Cache-Control: public | private | no-store.

The current release wires both fields into the eligibility engine:

cacheScope: no-store is never cacheable, full stop. public and private are advisory today (they would affect shared-cache semantics if mcptest ran a shared cache, which it does not).
ttlMs lands on the cache entry as a unix-epoch expires_at_unix = now + ttlMs / 1000. A get() past the deadline returns a miss and the store evicts the sidecar + payload in the same pass. Entries without a server-declared TTL stay in the cache until the LRU policy evicts them, matching the v0 behavior.

Stores write a TTL via CacheStore::put_with_ttl(key, value, Some(deadline)). The trait's default put_with_ttl forwards to the bare put so backends that have not yet learned the expiry-on-read rule stay correct; the filesystem store overrides it to thread expires_at_unix onto the on-disk metadata sidecar. The pure rule lives in cache::store::is_expired(expires_at_unix, now_unix) so a future Redis or S3 backend shares one TTL contract.

The directive is read off the response value by mcptest_core::cache::parse_server_directive, which looks for the two fields at the top level first then falls back to the result wrapper. Unknown scopes fall back to "unset" rather than failing the parse so a typo on the server side does not brick a run.

How `effects:` and hooks affect eligibility

effects: is the structured way authors tell mcptest what a test touches. The accepted values are external, local, and filesystem. Only external excludes a test from caching. The other variants exist so authors can tag tests honestly without giving up the determinism win. The rationale: filesystem and local state are inside the test runner's blast radius, so the runner can guarantee a clean state between runs. An external API call cannot be replayed safely from the runner's perspective.

hooks: is the YAML escape hatch for arbitrary author code (shell commands, custom Rust extensions, custom matchers). Hooks are non-deterministic by definition, so any test that declares hooks is excluded.

When to use `cache: always`

Rarely. The default for tools tests already enables caching, so always is only useful when you want to force-cache a test the engine would otherwise default away from. The realistic case is debugging a flaky performance test: pin it to cache: always, run the rest of the suite quickly, and re-enable real measurement when you are ready.

cache: always does not silence the hard exclusions, so it is safe to leave on while you triage. If the test starts touching external state or grows hooks, the engine will refuse to cache it and the reporter will say why.

Reporter output

Every cache miss the engine reports surfaces the structured reason from EligibilityReason. The pretty reporter renders it as cache miss: <reason> (for example, cache miss: not cacheable (effect: external declared)); the JSON reporter emits the variant name plus the rendered string so downstream tooling can group misses.

This page documents the eligibility engine. The follow-up pieces, landing separately, are the cache storage layer, cache key derivation, and cache HTTP transport integration. Each builds on the eligibility contract described here.