Blog/Tags/evaluation

evaluation

Browse 2 articles tagged with “evaluation”.

Articles tagged “evaluation”

2 articles

A filing cabinet with most drawers empty and papers scattered on the floor, watercolor illustration in muted blue tones

Your Agent Completed the Task. It Also Forgot 87% of What It Knew.

Task completion hides a silent failure: agents forget 87% of stored knowledge under complexity. New research reveals why standard evals miss this entirely.

Watercolor illustration of a split dashboard showing human reviewers on one side and automated scoring metrics on the other

Operations·15 min read read

74% of Production Agents Still Rely on Human Evaluation

A survey of 306 practitioners reveals most production agents are far simpler than expected. The eval gap isn't a tooling problem. It's a trust problem.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.

500+ engineers subscribed