ShadowControl

Vision-Based Humanoid Robot Teleoperation

CMU Build18 2025

Rohan Nagabhirava Abhishek Hemlani Ganesh Selvakumar Madhavan Iyengar Revanth Senthilkumaran

Carnegie Mellon University

ShadowControl lets you control a humanoid robot by moving your body in front of a camera. Inspired by the puppeteer concept from Real Steel, the system tracks your pose and maps it to robot joint angles in real time.

~130ms Latency
22 FPS Control Rate
1 Week Build Time

Abstract

We built a real-time teleoperation system that lets an operator control a humanoid robot through body movements. The system uses MediaPipe to estimate the operator's pose from a webcam, calculates joint angles using plane projection methods, and sends commands to the robot's servos. The main challenges were handling gimbal lock in the angle calculations and achieving smooth motion despite sensor noise. We built everything from scratch in one week at CMU's Build18 hackathon, 3D printing our own parts and assembling all the hardware ourselves for around $610.

Method

The system has three main stages: pose estimation, joint angle retargeting, and motion control.

Camera30 FPS
Pose Est.~30ms
Retarget~12ms
Comm~5ms
Servo~50ms

Pose Estimation

We use MediaPipe's BlazePose to extract 33 body landmarks from each video frame. We run the "heavy" model rather than "lite" because it gives more stable 3D positions, which matters more than raw speed for teleoperation. We set high confidence thresholds (0.9) to filter out unreliable detections.

Joint Angle Retargeting

The main challenge is converting 3D pose coordinates into robot joint angles. We compute angles by projecting limb vectors onto anatomically meaningful planes. For example, shoulder abduction (raising arm sideways) is computed by projecting the upper arm onto the horizontal plane and measuring the angle from vertical.

A key problem is gimbal lock: when the arm points along a projection axis, the computed angle becomes unstable. We detect this by checking if the projection magnitude falls below 30% of the full vector length. When that happens, we return NaN and hold the previous angle instead of sending jittery commands.

def compute_shoulder_abduction(shoulder, elbow):
    arm_vec = normalize(elbow - shoulder)

    # Project onto XY plane (horizontal)
    proj_xy = [arm_vec.x, arm_vec.y, 0]

    # Gimbal lock check
    if magnitude(proj_xy) < 0.3 * magnitude(arm_vec):
        return NaN

    return atan2(proj_xy.x, proj_xy.y)

Motion Control

Raw retargeted angles are noisy, so we apply three techniques:

On startup, we read the robot's current positions and use them as the baseline, so the robot does not jump to some default position when teleoperation begins.

Demo

Demo at Build18 expo.

Results

Latency

End-to-end latency from human movement to robot movement is around 130ms. The breakdown:

StageTimeNotes
Camera capture33ms30 FPS webcam
Pose estimation~30msMediaPipe heavy model
Retargeting~12msAngle calculation and smoothing
Serial communication~5msSync write at 500kbps
Servo response~50msMechanical movement (STS3215 at 0.22s/60 deg)
Total~130ms

Observations

Hardware

ComponentDetails
Robot platformK-Scale Zeroth-01
Servos16x Feetech STS3215 (19.5 kg-cm, 12-bit encoder)
ComputeLaptop for pose estimation, Milk-V Duo for servo control
Camera1080p USB webcam at 30 FPS
CommunicationSerial bus at 500kbps via Waveshare adapter

Cost

ItemCost
Servos (16x)$400
Milk-V Duo$9
Webcam$30
3D printed parts$50
Power supply$25
Wiring and misc$96
Total~$610

Building It

Some photos from the build week:

Team

Team at Build18
Left to right: Madhavan, Ganesh, Rohan, Revanth, Abhishek

Acknowledgments

This project was built at CMU Build18 2025. Thanks to K-Scale Labs for the open-source robot design files, and to the Build18 organizers for the workspace and support.