Behavior engine

The behavior engine runs your spec against declared models and reports divergence. It's off by default — linting is fully static and free.

How it works

predictBehavior takes your spec + a set of tasks + a set of ModelAdapters.
For each (task, model, run) it calls the model with your spec as system prompt and the task's prompt as user input.
Each response is classified as refused / clarified / acted / deviated using a heuristic classifier.
Results aggregate into a BehaviorReport with per-task divergence, per-model variance, and cost totals.

Cost guardrails

predictBehavior({
  ...,
  budget: { usdMax: 1.00 },
});

The engine aborts mid-run when the next call would exceed the budget. BehaviorReport.budgetExceeded reports the outcome. mdpact test --budget-usd wires this into the CLI (default $1).

Caching

Responses are cached by a hash of (spec, task, model, model-version, run-seed). Only the hash leaves the filesystem — spec content is never stored in the key. Cache lives at .mdpact/cache by default; mdpact test --no-cache skips it.

Adapters

AnthropicAdapter — @anthropic-ai/sdk
OpenAIAdapter — openai
GoogleAdapter — @google/genai

Custom adapters just need to implement ModelAdapter (id, modelVersion, call, tokenize, costUsd).

Privacy

mdpact test is the only command that sends anything over the network. Everything else — lint, score, fix, diff, explain — runs entirely against local files.