New report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
We have just released our first public report. It introduces methodology for assessing the capacity of LLM agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild.
Read More