Robot Learning Benchmarks
Standardized evaluation for robot manipulation — RLBench, LIBERO, CALVIN, and more. Success rates, task completion, evaluation metrics.
Popular Categories
Popular Tags
Benchmarks for Manipulation
RLBench
100+ manipulation tasks in PyRep. Widely used for VLA evaluation. BridgeVLA 88.2%, InternVLA 95%+ on subsets.
View benchmark → SimulationLIBERO
Lifelong learning benchmark. 130 tasks, spatial/object/goal suites. RoboSuite. 95.9% SOTA (InternVLA).
View benchmark → SimulationCALVIN
Composing Actions from Language and Vision. Long-horizon, language-conditioned. RoboFlamingo strong baseline.
View benchmark → Real RobotGoogle Robot Benchmark
Real-world manipulation. 700+ tasks. WidowX, various embodiments. Success rate, multi-task evaluation.
View benchmark → Real RobotCOLOSSEUM
Large-scale real-robot benchmark. Diverse tasks, environments. BridgeVLA 64%.
View benchmark →Suggested Models & Datasets
Comparable Metrics
Benchmarks are grouped for apples-to-apples performance checks.
Real vs Sim Coverage
Evaluate both controlled and deployment-oriented settings.
Model Mapping
Each benchmark path links to compatible model families.
Execution Support
Support for data capture and evaluation operations when needed.