Vision-Based Humanoid Robot Teleoperation
CMU Build18 2025
Carnegie Mellon University
ShadowControl lets you control a humanoid robot by moving your body in front of a camera. Inspired by the puppeteer concept from Real Steel, the system tracks your pose and maps it to robot joint angles in real time.
We built a real-time teleoperation system that lets an operator control a humanoid robot through body movements. The system uses MediaPipe to estimate the operator's pose from a webcam, calculates joint angles using plane projection methods, and sends commands to the robot's servos. The main challenges were handling gimbal lock in the angle calculations and achieving smooth motion despite sensor noise. We built everything from scratch in one week at CMU's Build18 hackathon, 3D printing our own parts and assembling all the hardware ourselves for around $610.
The system has three main stages: pose estimation, joint angle retargeting, and motion control.
We use MediaPipe's BlazePose to extract 33 body landmarks from each video frame. We run the "heavy" model rather than "lite" because it gives more stable 3D positions, which matters more than raw speed for teleoperation. We set high confidence thresholds (0.9) to filter out unreliable detections.
The main challenge is converting 3D pose coordinates into robot joint angles. We compute angles by projecting limb vectors onto anatomically meaningful planes. For example, shoulder abduction (raising arm sideways) is computed by projecting the upper arm onto the horizontal plane and measuring the angle from vertical.
A key problem is gimbal lock: when the arm points along a projection axis, the computed angle becomes unstable. We detect this by checking if the projection magnitude falls below 30% of the full vector length. When that happens, we return NaN and hold the previous angle instead of sending jittery commands.
def compute_shoulder_abduction(shoulder, elbow):
arm_vec = normalize(elbow - shoulder)
# Project onto XY plane (horizontal)
proj_xy = [arm_vec.x, arm_vec.y, 0]
# Gimbal lock check
if magnitude(proj_xy) < 0.3 * magnitude(arm_vec):
return NaN
return atan2(proj_xy.x, proj_xy.y)
Raw retargeted angles are noisy, so we apply three techniques:
On startup, we read the robot's current positions and use them as the baseline, so the robot does not jump to some default position when teleoperation begins.
Demo at Build18 expo.
End-to-end latency from human movement to robot movement is around 130ms. The breakdown:
| Stage | Time | Notes |
|---|---|---|
| Camera capture | 33ms | 30 FPS webcam |
| Pose estimation | ~30ms | MediaPipe heavy model |
| Retargeting | ~12ms | Angle calculation and smoothing |
| Serial communication | ~5ms | Sync write at 500kbps |
| Servo response | ~50ms | Mechanical movement (STS3215 at 0.22s/60 deg) |
| Total | ~130ms |
| Component | Details |
|---|---|
| Robot platform | K-Scale Zeroth-01 |
| Servos | 16x Feetech STS3215 (19.5 kg-cm, 12-bit encoder) |
| Compute | Laptop for pose estimation, Milk-V Duo for servo control |
| Camera | 1080p USB webcam at 30 FPS |
| Communication | Serial bus at 500kbps via Waveshare adapter |
| Item | Cost |
|---|---|
| Servos (16x) | $400 |
| Milk-V Duo | $9 |
| Webcam | $30 |
| 3D printed parts | $50 |
| Power supply | $25 |
| Wiring and misc | $96 |
| Total | ~$610 |
Some photos from the build week:
This project was built at CMU Build18 2025. Thanks to K-Scale Labs for the open-source robot design files, and to the Build18 organizers for the workspace and support.