Articles tagged “evaluation”
2 articles

Knowledge & Memory·12 min read read
Your Agent Completed the Task. It Also Forgot 87% of What It Knew.
Task completion hides a silent failure: agents forget 87% of stored knowledge under complexity. New research reveals why standard evals miss this entirely.
Read More

Operations·15 min read read
74% of Production Agents Still Rely on Human Evaluation
A survey of 306 practitioners reveals most production agents are far simpler than expected. The eval gap isn't a tooling problem. It's a trust problem.
Read More
Learn Agentic AI
One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.