contact book call

Current Japanese Micro-Season

Loading...

Loading...

Loading...

Loading...

<<<

10 Things I Hate About AI Evals with Hamel Husain

Trey Causey

One of the highest signal ~hours I've watched recently. Hamel Husain (and his teaching partner Shreya Shankar) are now synonymous with "AI evals"; here he lays out how he thinks people get them wrong and what he'd change.

Their course is excellent and about to start again.

Plus, a bonus appearance by Bryan Bischof and the suggestion that “AI evals” should just be renamed “data science for AI”.

Here are his peeves, highly encourage watching though:

  1. Generic Metrics & Off-the-Shelf Evals
  2. Completely Outsourcing Data Review & Leaving Out Domain Experts
  3. Overeager Eval Automation
  4. Not Looking at the Data At All
  5. Not Thinking Deeply About Prompts
  6. Dashboards Full of Noisy Metrics
  7. Getting Stuck with Annotation
  8. Endlessly Trying Tools Instead of Error Analysis
  9. Putting LLMs in the Judge’s Seat Without Human Oversight
  10. Engineers Not Using AI Enough Themselves (Lack of Intuition)

Subscribe for new posts

Blog posts only. No commonplace entries. Never sold or shared.