Open-Source Robotics Learning Datasets

A curated catalog of open-source datasets for robot manipulation, imitation learning, and reinforcement learning — with links to official sources.

Collection

Real-World Manipulation Data

Datasets with in-the-wild robot interactions and long-horizon tasks.

Collection

Benchmark-Centric Datasets

Suites designed for reproducible evaluation and cross-paper comparison.

Collection

Cross-Robot Ecosystems

Shared formats and multi-embodiment data for foundation model training.

Quick Browse

Popular Categories

Fast Tags

Popular Tags

Catalog

Datasets for Robot Learning

Each dataset has a dedicated page with description, scale, access links, and citations.

RSS 2024

DROID

76K trajectories, 350 hours, 86 tasks. In-the-wild manipulation from 50 collectors across 564 scenes. TensorFlow Datasets, Hugging Face.

View dataset →

2023

BridgeData V2

60K trajectories, 24 environments, 13 manipulation skills. Low-cost WidowX robot. Natural language labels, multi-task learning.

View dataset →

Google DeepMind

Open X-Embodiment

1M+ episodes, 22 robot types, 500+ skills. Unified RLDS format. RT-X models. 33 institutions.

View dataset →

Stanford / NVIDIA

ALOHA

Bimanual teleoperation. ALOHA-Cosmos-Policy, baseline datasets. HDF5, Hugging Face. Open hardware.

View dataset →

Benchmark

LIBERO

130 tasks, 65K demos. Lifelong learning benchmark. Spatial, object, goal suites. RoboSuite simulation.

View dataset →

Stanford / Berkeley

RoboNet

15M frames, 7 robot platforms. Multi-robot transfer. Sawyer, Franka, Baxter, Fetch, WidowX.

View dataset →

ARISE Initiative

RoboMimic & MimicGen

Framework + datasets. MimicGen: 50K demos from 200 human demos. Simulation + real. MIT license.

View dataset →

Hugging Face

LeRobot

Standardized format + hub. DROID-100, ALOHA, SO-100. PyTorch, streaming. "ImageNet of robotics."

View dataset →

Linked Resources

Models & Tools You Can Pair

Research-Ready Curation

We highlight scale, format, and access details needed for quick evaluation.

Cross-Stack Compatibility

Datasets are mapped to practical model and tool ecosystems.

Deployment Context

Dataset choices are linked with real robot execution constraints.

Scale-up Path

When open data is not enough, we support custom collection pipelines.

Need Custom Data?

We collect high-quality, learning-ready data for your specific tasks and hardware.

Request Data Contact Us