← Models

RoboFlamingo

Vision-Language Foundation Models as Effective Robot Imitators. OpenFlamingo-based.

Overview

RoboFlamingo builds on OpenFlamingo to combine single-step vision-language understanding with an explicit policy head for sequential robot control. Fine-tuned via imitation learning. Trainable on a single GPU server.

Architecture & Performance

  • OpenFlamingo backbone (MPT-3B, 4B, 9B variants)
  • Policy head for sequential decision-making
  • Strong on CALVIN benchmark
  • Open-loop control, low-resource deployment

Official Links

Citation

See the project site for BibTeX and paper references.