A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulatione

Robotic manipulation faces critical challenges in understanding spatial affordances—the "where" and "how" of object interactions—essential for complex manipulation tasks like wiping a board or stacking objects. Existing methods, including modular-based and end-to-end approaches, often lack robust spatial reasoning capabilities. Unlike recent point-based and flow-based affordance methods that focus on dense spatial representations or trajectory modeling, we propose A₀, a hierarchical affordance-aware diffusion model that decomposes manipulation task into high-level spatial affordance understanding and low-level action execution. A₀ leverages the Embodiment-Agnostic Affordance Representation, which captures object-centric spatial affordances by predicting contact point and post-contact trajectories. A₀ is pre-trained on 1 million contact points data and fine-tuned on annotated trajectories, enabling generalization across platforms. Key components include Position Offset Attention for motion-aware feature extraction and a Spatial Information Aggregation Layer for precise coordinate mapping. The model's output is executed by the action execution module. Experiments on multiple robotic systems (Franka, Kinova, Realman and Dobot) demonstrate A₀'s superior performance in complex tasks, showcasing its efficiency, flexibility, and real-world applicability.

\( \mathrm{MAE}\downarrow \)	HOI4D-22k	Maniskill-5k	DROID-3k
\( A_0\text{-1B} \)	47.5	5.5	17.5
\( A_0\text{-1B w/o POA} \)	47.9	6.3	18.5
\( A_0\text{-1B w/o SIAL} \)	61.1	10.2	19.6

Robot	Method	Place Object	Open Drawer	Press Button	Wipe Board	Avg. Success
Kinova	MOKA	70	50	30	30	45.00
ReKep	75	55	5	0	33.75
\( A_0\text{-1B} \)	60	65	40	50	53.75
Franka	Magma	25	10	30	0	16.25
Molmo	60	40	55	20	43.75
\( A_0\text{-1B} \)	60	75	70	45	62.50

~	Wipe Board	Steps
RDT-1B [1]	10	25–50
\( \pi_0 \) [2]	35	25–50
\( \pi_0 \)+FAST [2]	30	25–50
\( A_0\text{-1B} \)	50	4–5

BibTeX

@misc{xu2025a0affordanceawarehierarchicalmodel,
        title={A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation}, 
        author={Rongtao Xu and Jian Zhang and Minghao Guo and Youpeng Wen and Haoting Yang and Min Lin and Jianzheng Huang and Zhe Li and Kaidong Zhang and Liqiong Wang and Yuxuan Kuang and Meng Cao and Feng Zheng and Xiaodan Liang},
        year={2025},
        eprint={2504.12636},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2504.12636}, 
  }

致谢 / Acknowledgements

We would like to express our gratitude to the Track Anything (https://github.com/gaomingqi/Track-Anything) project for its valuable contribution. It significantly facilitated our automated data annotation process and improved the efficiency of data annotation.
我们感谢 Track Anything（https://github.com/gaomingqi/Track-Anything）项目对我们的支持。该项目帮助我们实现了自动化的数据标注流程，大幅提升了数据标注的效率。

A₀: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

Real World Demos

Abstract

Comparison of different manipulation methods.

Overview of A₀ model.

Qualitative Results.

Pretraining significantly lowers T-waypoint MAE and improves generalization, underscoring its value for robust manipulation.

BibTeX

致谢 / Acknowledgements

A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

Real World Demos

Abstract

Comparison of different manipulation methods.

Overview of A0 model.

Qualitative Results.

Pretraining significantly lowers T-waypoint MAE and improves generalization, underscoring its value for robust manipulation.

BibTeX

致谢 / Acknowledgements

A₀: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

Overview of A₀ model.