
Cutting AI agent hallucinations 62% with a human-in-the-loop review
The problem
Customer-facing AI agent was confidently wrong on 1 in 4 replies, eroding trust in the first product week.
Role
Senior AI PM · led 4 engineers, 1 designer, 2 annotators
Approach
- Ran 18 user interviews to map where trust broke down
- Shipped a lightweight review queue before scaling the model
- Instrumented agent confidence + reviewer overrides as first-class metrics
What I learned — Trust compounds slower than features. Slow the launch, ship the safety net first.


