EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration

The first framework enabling humanoid robots to learn whole-body loco-manipulation from egocentric human demonstrations, achieving 51% improvement in generalization

Modi Shi2,3*   Shijia Peng4*   Jin Chen2*   Haoran Jiang2   Yinghui Li1,4
Di Huang3   Ping Luo1   Hongyang Li1♮   Li Chen1♮
1 The University of Hong Kong   2 Shanghai Innovation Institute   3 Beihang University   4 Kinetix AI
*Equal Contribution   ♮Project Lead
Scroll

Research Summary

Human demonstrations offer rich environmental diversity and scale naturally, making them an appealing alternative to robot teleoperation. We present EGOHUMANOID, the first framework to co-train a vision-language-action policy using abundant egocentric human demonstrations together with a limited amount of robot data.

To bridge the embodiment gap, we introduce a systematic alignment pipeline with two key components: view alignment reduces visual discrepancies; action alignment maps human motions into a unified action space for humanoid control.

Extensive real-world experiments demonstrate that incorporating robot-free egocentric data significantly outperforms robot-only baselines by 51%, particularly in unseen environments.

First Human-to-Humanoid Transfer

The first endorsement for whole-body loco-manipulation tasks.

Principled Alignment Pipeline

View and action alignment to bridge embodiment gaps.

Comprehensive Evaluation

51% improvement in generalization performance.

Real World Loco-Manipulation Tasks

Spanning varying levels of difficulty for large-space movement and dexterous manipulation

Robot Data Collection

Teleoperated demonstrations in laboratory environments

Laboratory
Pillow Placement
Laboratory
Trash Disposal
Laboratory
Toy Transfer
Laboratory
Cart Stowing

Human Data Collection

Egocentric demonstrations in diverse real-world environments

In-the-Wild
Pillow Placement
In-the-Wild
Trash Disposal
In-the-Wild
Toy Transfer
In-the-Wild
Cart Stowing

Loco-Manipulation Deployment

Real-world deployment comparing different training configurations

Performance Comparison

Co-training consistently improves performance across both in-domain and generalization settings

In-Domain Performance
Laboratory environments
Robot-only
Co-training
Generalization Performance
Diverse real-world scenes (+51% improvement)
Robot-only
Co-training

Subtask Performance Breakdown

Navigation transfers effectively; manipulation depends on precision requirements

Data ConfigPillow PlacementTrash DisposalToy TransferCart Stowing
s1s2s1s2s1s2s3s4s1s2s3s4
Robot-only00654510050001001555
Human-only10095100801001004535100500
Co-training10095100751001006055100605050
Locomotion
Manipulation

Understanding Failure Modes

Systematic breakdown of failure cases across different training configurations

Robot-only
Human-only
Co-training

Human Data Scaling Behavior

Performance consistently improves as human demonstrations accumulate

Data sampling ratio:
Robot:Human 1:2
Robot:Human 1:1
Robot:Human 2:1
Average
Pillow Placement
Trash Disposal
Toy Transfer
Cart Stowing

Human-to-Humanoid Alignment Pipeline

A systematic approach to bridge the embodiment gap between humans and robots

Human Demonstrations

Diverse Environments

Human Demo

Robot Demonstrations

Laboratory Environments

Robot Demo

(a) View Alignment

Depth Estimation
Depth
Inpainting
Inpainting
Point Cloud Transform
🖱 Drag to rotate
Point Cloud Reprojection
Reprojection

(b) Action Alignment

✨ Unified Action Space
💪
Upper Body
6-DoF Delta End-Effector
X
Y
Z
Rx
Ry
Rz
🦵
Lower Body
Discrete Velocity Commands
⬆️
⬇️
⬅️
➡️
↩️
↪️
🖐️
Dexterous Hand
Binary Open/Close
Open
/
Close
BibTeX Citation
@article{shi2026egohumanoid,
    title={EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration},
    author={Shi, Modi and Peng, Shijia and Chen, Jin and Jiang, Haoran and Li, Yinghui and Huang, Di and Luo, Ping and Li, Hongyang and Chen, Li},
    journal={arXiv preprint arXiv:2602.10106},
    year={2026}
}