agentic7 passed / 0 failed7 cases
Offline intent and guard checks for dynamic external workflows.
/home/gpt/pryaja2/packages/evals/golden/agentic.json
memory4 passed / 0 failed4 cases
Offline memory-write regression cases for durable vs episode-only behavior.
/home/gpt/pryaja2/packages/evals/golden/memory.json
research2 passed / 0 failed2 cases
Offline research-lane regression cases for query normalization and evidence artifact shape.
/home/gpt/pryaja2/packages/evals/golden/research.json
tool_policy3 passed / 0 failed3 cases
Offline policy-engine regression cases for canonical ToolPolicyDecision values.
/home/gpt/pryaja2/packages/evals/golden/tool_policy.json
zero_shot3 passed / 0 failed3 cases
Offline checks for task-first lightweight strategy on simple no-tool questions.
/home/gpt/pryaja2/packages/evals/golden/zero_shot.json