Cache eligibility
mcptest can skip work that is provably deterministic by reading a prior result from a cache instead of re-running the test. Before the storage layer comes anywhere near a test, the engine asks one question: is this test even eligible to be cached? That question is the job of the eligibility rules engine in mcptest-core::cache::eligibility, and it is the only piece of the cache shipped today. The storage, cache key derivation, and HTTP integration land in follow-up work.
Run this example. examples/cache-eligibility.yml shows how cache:, effects:, and a server version pin decide what gets cached and what is always re-run.
mcptest run --config examples/cache-eligibility.yml
Why caching is per-test-type-defaulted
Each kind of test has a typical relationship with determinism:
- Tools tests are usually a fixed input mapped to a fixed output. Cacheable by default.
- Compliance tests are read-only protocol probes (
initialize,tools/list, error shape). Cacheable by default. - Eval tests route through an LLM with sampling. The same input produces different scores across runs. Never cacheable by default.
- Performance tests measure wall-clock latency. Caching a timing measurement would be a lie. Never cacheable by default.
- Model-compatibility tests mirror tools-style behavior across models, so they share the tools default.
The defaults exist so authors do not have to annotate every test. When the default is wrong for a particular test, the author overrides it with cache: always or cache: never.
Precedence
The engine evaluates inputs in a fixed order. Higher in the list wins:
cache: neverdirective on the test.- Server
cacheScope: no-storeon the response envelope (, SEP-2549). A server can opt out of caching regardless of the author'scache: alwaysdirective. Hard exclusions:
hooks:block declared on the test.- HTTP transport without an explicit
server_version:pin. effects:list containsexternal.
cache: alwaysdirective on the test.- The test-kind default.
Two things to call out:
cache: neveris sovereign. It is the only directive that beats every hard exclusion. Authors use it when they want a single test pinned-uncacheable even after a future refactor changes the engine's view of the test.cache: alwayscannot override hard exclusions. Forcing the cache on a hook-driven test or a test witheffects: [external]would let the cache silently return a stale answer, which is exactly the failure mode the exclusions exist to prevent.
Compliance has one carve-out: the HTTP-without-pin exclusion does not apply to it. Compliance probes ask the protocol whether it advertises a capability. The answer does not depend on which build of the server is at the other end, so the version pin is not load-bearing for compliance.
Server cache directives
The 2026-07-28 spec lets a server attach two cache hints to tools/list, resources/list, resources/read, and prompts/list responses:
ttlMs(number): how long the client may cache the result, in milliseconds. Modeled on HTTPCache-Control: max-age.cacheScope(string): one ofpublic,private, orno-store. Modeled on HTTPCache-Control: public | private | no-store.
The current release wires both fields into the eligibility engine:
cacheScope: no-storeis never cacheable, full stop.publicandprivateare advisory today (they would affect shared-cache semantics if mcptest ran a shared cache, which it does not).ttlMslands on the cache entry as a unix-epochexpires_at_unix = now + ttlMs / 1000. Aget()past the deadline returns a miss and the store evicts the sidecar + payload in the same pass. Entries without a server-declared TTL stay in the cache until the LRU policy evicts them, matching the v0 behavior.
Stores write a TTL via CacheStore::put_with_ttl(key, value, Some(deadline)). The trait's default put_with_ttl forwards to the bare put so backends that have not yet learned the expiry-on-read rule stay correct; the filesystem store overrides it to thread expires_at_unix onto the on-disk metadata sidecar. The pure rule lives in cache::store::is_expired(expires_at_unix, now_unix) so a future Redis or S3 backend shares one TTL contract.
The directive is read off the response value by mcptest_core::cache::parse_server_directive, which looks for the two fields at the top level first then falls back to the result wrapper. Unknown scopes fall back to "unset" rather than failing the parse so a typo on the server side does not brick a run.
How effects: and hooks affect eligibility
effects: is the structured way authors tell mcptest what a test touches. The accepted values are external, local, and filesystem. Only external excludes a test from caching. The other variants exist so authors can tag tests honestly without giving up the determinism win. The rationale: filesystem and local state are inside the test runner's blast radius, so the runner can guarantee a clean state between runs. An external API call cannot be replayed safely from the runner's perspective.
hooks: is the YAML escape hatch for arbitrary author code (shell commands, custom Rust extensions, custom matchers). Hooks are non-deterministic by definition, so any test that declares hooks is excluded.
When to use cache: always
Rarely. The default for tools tests already enables caching, so always is only useful when you want to force-cache a test the engine would otherwise default away from. The realistic case is debugging a flaky performance test: pin it to cache: always, run the rest of the suite quickly, and re-enable real measurement when you are ready.
cache: always does not silence the hard exclusions, so it is safe to leave on while you triage. If the test starts touching external state or grows hooks, the engine will refuse to cache it and the reporter will say why.
Reporter output
Every cache miss the engine reports surfaces the structured reason from EligibilityReason. The pretty reporter renders it as cache miss: <reason> (for example, cache miss: not cacheable (effect: external declared)); the JSON reporter emits the variant name plus the rendered string so downstream tooling can group misses.
Related work
This page documents the eligibility engine. The follow-up pieces, landing separately, are the cache storage layer, cache key derivation, and cache HTTP transport integration. Each builds on the eligibility contract described here.