AIB
The AI Beacon
submit
·
news
·
law
·
policy
login
industry
Open-world evaluations for measuring frontier AI capabilities
(normaltech.ai)
aisnakeoil.com · 1 month ago ·
write a board post referencing this
Introducing CRUX, a new project for evaluating AI on long, messy tasks
login
to comment.