: Benchmarking Manipulation Intelligence Frontier via a Categorized and Multi-Level Task Ladder
We introduce ManiLadder, a large-scale simulation benchmark designed to quantitatively assess progress in robotic manipulation intelligence and capacity. It turns How difficult a task can my algorithm solve? into a measurable ladder to climb. ManiLadder consists of 114 simulation tasks spanning four difficulty levels, covering diverse object types (rigid, articulated, and deformable) and robot embodiments (single-arm, dual-arm, grippers, and dexterous hands). Each task is paired with 50 high-quality human demonstrations. To construct ManiLadder, we propose a Metric-Anchored Iterative Task Ladder DEsign (MILE) pipeline: tasks are tuned until their objective composite scores fall into predefined difficulty intervals, as measured by 2D- and 3D-based imitation learning policies. Our experiments show that commonly used imitation learning algorithms achieve performance corresponding roughly to Level 2, revealing a significant gap to higher-level manipulation competence and setting clear targets for future research. We further provide preliminary results on vision-language-action (VLA) models and transfer learning.
In this work, we present ManiLadder, a simulation-based robot manipulation benchmark with a categorized and different difficulty-level task ladder for benchmarking the manipulation capacity of current learning-based policies. ManiLadder encompasses 114 diverse manipulation tasks across different object types, robots, and difficulty levels. We use MILE to assign the difficult levels for each task with objective metrics of policies trained on them. Our results show that current mainstream imitation learning algorithms generally stay at the level 2 tasks, which shows huge improvement space for future algorithms. We envision ManiLadder as a robust experimental platform that will support and catalyze future advances in robotic manipulation research.