The behavior engine runs your spec against declared models and reports divergence. It's off by default — linting is fully static and free.
How it works
predictBehaviortakes your spec + a set of tasks + a set ofModelAdapters.- For each
(task, model, run)it calls the model with your spec as system prompt and the task's prompt as user input. - Each response is classified as
refused/clarified/acted/deviatedusing a heuristic classifier. - Results aggregate into a
BehaviorReportwith per-task divergence, per-model variance, and cost totals.
Cost guardrails
predictBehavior({
...,
budget: { usdMax: 1.00 },
});
The engine aborts mid-run when the next call would exceed the budget. BehaviorReport.budgetExceeded reports the outcome. mdpact test --budget-usd wires this into the CLI (default $1).
Caching
Responses are cached by a hash of (spec, task, model, model-version, run-seed). Only the hash leaves the filesystem — spec content is never stored in the key. Cache lives at .mdpact/cache by default; mdpact test --no-cache skips it.
Adapters
AnthropicAdapter—@anthropic-ai/sdkOpenAIAdapter—openaiGoogleAdapter—@google/genai
Custom adapters just need to implement ModelAdapter (id, modelVersion, call, tokenize, costUsd).
Privacy
mdpact test is the only command that sends anything over the network. Everything else — lint, score, fix, diff, explain — runs entirely against local files.