Flower@Realman
Place object @Realman
Wipe the blackboard @Realman
Place objects @Realman
Place on the plate @Franka
Wipe the blackboard @Franka
Press button @Kinova Gen3
Open drawer meanwhile @Kinova Gen3
Pick place @Kinova Gen3
Wipe the blackboard @Franka
Put the blue block on top of the red block @Kinova Gen3
Wipe the white board @Franka
Press button @Franka
Open drawer @Franka
Wipe board @Kinova Gen3
Robotic manipulation faces critical challenges in understanding spatial affordances—the "where" and "how" of object interactions—essential for complex manipulation tasks like wiping a board or stacking objects. Existing methods, including modular-based and end-to-end approaches, often lack robust spatial reasoning capabilities. Unlike recent point-based and flow-based affordance methods that focus on dense spatial representations or trajectory modeling, we propose A0, a hierarchical affordance-aware diffusion model that decomposes manipulation task into high-level spatial affordance understanding and low-level action execution. A0 leverages the Embodiment-Agnostic Affordance Representation, which captures object-centric spatial affordances by predicting contact point and post-contact trajectories. A0 is pre-trained on 1 million contact points data and fine-tuned on annotated trajectories, enabling generalization across platforms. Key components include Position Offset Attention for motion-aware feature extraction and a Spatial Information Aggregation Layer for precise coordinate mapping. The model's output is executed by the action execution module. Experiments on multiple robotic systems (Franka, Kinova, Realman and Dobot) demonstrate A0's superior performance in complex tasks, showcasing its efficiency, flexibility, and real-world applicability.
\( \mathrm{MAE}\downarrow \) | HOI4D-22k | Maniskill-5k | DROID-3k |
---|---|---|---|
\( A_0\text{-1B} \) | 47.5 | 5.5 | 17.5 |
\( A_0\text{-1B w/o POA} \) | 47.9 | 6.3 | 18.5 |
\( A_0\text{-1B w/o SIAL} \) | 61.1 | 10.2 | 19.6 |
Robot | Method | Place Object | Open Drawer | Press Button | Wipe Board | Avg. Success |
---|---|---|---|---|---|---|
Kinova | MOKA | 70 | 50 | 30 | 30 | 45.00 |
ReKep | 75 | 55 | 5 | 0 | 33.75 | |
\( A_0\text{-1B} \) | 60 | 65 | 40 | 50 | 53.75 | |
Franka | Magma | 25 | 10 | 30 | 0 | 16.25 |
Molmo | 60 | 40 | 55 | 20 | 43.75 | |
\( A_0\text{-1B} \) | 60 | 75 | 70 | 45 | 62.50 |
~ | Wipe Board | Steps |
---|---|---|
RDT-1B [1] | 10 | 25–50 |
\( \pi_0 \) [2] | 35 | 25–50 |
\( \pi_0 \)+FAST [2] | 30 | 25–50 |
\( A_0\text{-1B} \) | 50 | 4–5 |
@misc{xu2025a0affordanceawarehierarchicalmodel,
title={A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation},
author={Rongtao Xu and Jian Zhang and Minghao Guo and Youpeng Wen and Haoting Yang and Min Lin and Jianzheng Huang and Zhe Li and Kaidong Zhang and Liqiong Wang and Yuxuan Kuang and Meng Cao and Feng Zheng and Xiaodan Liang},
year={2025},
eprint={2504.12636},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2504.12636},
}