Marcus Chen liked this · matches Decagon Sr SWE shortlist
Priya RaghavanPost
Staff Infra Eng · Anthropic
2h·
We just shipped a long-context eval harness that runs ~40× cheaper than the open ones. The trick wasn't a smarter judge — it was caching the rubric tokens. Writeup tomorrow.
#evals#llm-infra
41238 comments · 51 reposts