The beginning of the end is approaching for human beings. Figure 01, a humanoid robot powered by OpenAI artificial intelligence, has shown visual reasoning and language comprehension abilities. Using neural networks, its creators managed to make it execute actions and respond using a synthetic voice.
The robot is the work of Figure, an intelligence company that seeks to “expand human capabilities” through advanced AI. The company has as purpose “revolutionize the production chain” with humanoid robots capable of performing unsafe or undesirable jobs. In this way, we would all have happier and more purposeful lives.
To reach that future, Figure has closed an agreement with OpenAI to integrate its artificial intelligence into Figure 01. With this, the robot would be able to understand the language and act accordinglybeing able to integrate as a worker in a factory.
The company has shown the first fruit of that collaboration in a video demonstration. In it, the Figure 01 responds to a series of commands with specific actions. The humanoid robot is capable of identifying the objects in front of it and understands the language of the person with whom it interacts.
— Figure 01, what do you see now?
— I see a red apple on a plate in the center of the table, a drainer with plates and glasses and you standing nearby with one hand on the table
When asking for something to eat, Figure 01 gives the apple to the person and explains that he did it because it is the only edible thing on the table. All this while she places trash in a plastic box. Subsequently, do a reasoning exercise by placing glasses and plates on the drainer, followed by a self-assessment of their performance.
According to Brett Adcock, founder of Figure, the robot Figure 01 executes the actions through end-to-end neural networks. The demonstration was recorded in real time and there is no teleoperation.
“The integrated cameras are powered by a large visual language model (VLM) trained by OpenAI,” Adcock said. “The neural networks take images at 10 Hz through cameras on the robot. The neural network then generates 24 degrees of freedom actions at 200 Hz.”
How Figure 01 works, the robot with OpenAI AI that responds like a human
Corey Lynch, director of AI at the company, explained that all robot behaviors are learned and executed at normal speed. Lynch, who is also the leader of the Figure 01 project, revealed that OpenAI's artificial intelligence takes the images from the cameras and transcribes the instructions captured in audio by the robot's microphones into text.
“The model processes the entire history of the conversation, including past images, to generate linguistic responses, which are returned to the human via text-to-speech,” Lynch said. in a post on X (Twitter). “The same model is responsible for deciding what learned behavior the robot should execute to fulfill a given command, loading the task from the neural networks to the GPU, and executing a policy.”
Lynch explains that The OpenAI model allows the robot to describe its environment and use common sense reasoning for taking decisions. Artificial intelligence also gives you the power to understand ambiguous requests and act accordingly. But none of this would be possible without the integration of neural networks to generate movements.
All behaviors are driven by visual-motor policies of neural networks, which map pixels directly to actions. These networks capture integrated images at 10 Hz and generate 24 degrees of freedom actions (wrist postures and finger joint angles) at 200 Hz.
Corey Lynch
In general terms, Figure 01 operates as follows
- OpenAI model does the reasoning and designs a plan
- Policies learned from the neural network execute the plan through rapid and reactive movements, relying on a full-body controller to maintain balance
“A few years ago I would have thought that having a conversation with a humanoid robot while it plans and carries out its own fully learned behaviors would be something we would have to wait decades to see,” Lynch said.. “Obviously, a lot of things have changed.”