Evaluations and Benchmarks for Physics Agents
This research theme focuses on developing new evaluation methods and benchmarks for physics agents. The goal is to use physics as a ground truth, providing a solid foundation for assessing the performance and capabilities of AI agents. Innovative interpretability methods are also being explored.