The more interesting question, then, is not whether you do evals, but when you can afford to be less rigorous and when you cannot.
Subscribe for new posts
Blog posts only. No commonplace entries. Never sold or shared.
Current Japanese Micro-Season
Loading...
Loading...
The more interesting question, then, is not whether you do evals, but when you can afford to be less rigorous and when you cannot.
Blog posts only. No commonplace entries. Never sold or shared.
It’s OK to be less rigorous when your task(s) are already heavily baked into the foundation model’s post-training (such as with coding).
It’s also ok when you have enough domain expertise and dogfood early and often.
In the end, it’s always better to avoid Twitter and form your own conclusions about things instead of parroting the hot takes of the moment.