industry

Open-world evaluations for measuring frontier AI capabilities (normaltech.ai)

aisnakeoil.com · 1 month ago · write a board post referencing this
Introducing CRUX, a new project for evaluating AI on long, messy tasks

login to comment.