Forget Sora, this is the AI video that will blow your mind – and maybe scare you

Humanoid robotic development has for the better part of two decades moved at a snail's pace but rapid acceleration is underway thanks to a collaboration between Figure AI and OpenAI with the result being the most stunning bit of real humanoid robot video I've ever seen.

On Wednesday, startup robotics firm Figure AI released a video update (see below) of its Figure 01 robot running a new Visual Language Model (VLM) that has somehow transformed the bot from a rather uninteresting automaton into a full-fledged sci-fi bot that approaches C-3PO-level capabilities.

YouTube

Watch On

In the video, Figure 01 stands behind a table set with a plate, an apple, and a cup. To the left is a drainer. A human stands in front of the robot and asks the robot, "Figure 01, what do you see right now?"

Speech-to-speech

The company explained in a release that Figure 01 engages in "speech-to-speech" reasoning using OpenAI's pre-trained multimodal model, VLM, to understand images and texts and relies on an entire voice conversation to craft its responses. This is different than, say, OpenAI's GPT-4, which focuses on written prompts.

It's also using what the company calls "learned low-level bimanual manipulation." The system matches precise image calibrations (down to a pixel level) with its neural network to control movement. "These networks take in onboard images at 10hz, and generate 24-DOF actions (wrist poses and finger joint angles) at 200hz," Figure AI wrote in a release.

The company claims that every behavior in the video is based on system learning and is not teleoperated, meaning there's no one behind-the-scenes puppeteering Figure 01.

Without seeing Figure 01 in person, and asking my own questions, it's hard to verify these claims. There is the possibility that this is not the first time Figure 01 has run through this routine. It could've been the 100th time, which might account for its speed and fluidity.

Or maybe this is 100% real and in that case, wow. Just wow.

A 38-year industry veteran and award-winning journalist, Lance has covered technology since PCs were the size of suitcases and “on line” meant “waiting.” He’s a former Lifewire Editor-in-Chief, Mashable Editor-in-Chief, and, before that, Editor in Chief of PCMag.com and Senior Vice President of Content for Ziff Davis, Inc. He also wrote a popular, weekly tech column for Medium called The Upgrade.

Lance Ulanoff makes frequent appearances on national, international, and local news programs including Live with Kelly and Mark, the Today Show, Good Morning America, CNBC, CNN, and the BBC.

Forget Sora, this is the AI video that will blow your mind – and maybe scare you

Speech-to-speech

You might also like

Useful links