Figure recently launched #Helix – a humanoid robot (based on Vision Language Action Model – VLM) sorting objects using computer vision and agentic AI.
At first glance, it looks slow—taking over 2 minutes for a task a human can do in 20 seconds. Some are even questioning why we are “complicating” simple tasks.
But this is a huge milestone in AI development.
👉 So what really is happening under the hood – Helix is using 2 key technologies to perform its tasks:
1️⃣ Computer Vision – The robot doesn’t “see” like we do. Instead, cameras capture images, and deep learning models process them to identify shapes, textures, and categories. Unlike pre-programmed robots on factory floors, this one is classifying objects it has never seen before and deciding what to do with them in real time.
2️⃣ Agentic AI – This is where things get exciting. Traditional AI models are passive—they analyze data and give outputs when prompted. But agentic AI acts based on goals. It takes in visual data, makes decisions, and plans a sequence of actions without needing human intervention at every step.
If you are wondering why this is a significant milestone, well this is the first step toward blending machines into our physical world. AI is great at processing data in the virtual domain, but bringing intelligence into real-world interactions is a whole different challenge.
Jensen Huang calls this “Physical AI”—where machines don’t just compute but interact, adapt, and assist us in real-world tasks.
Yes, this is just a prototype. But so were self-driving cars a decade ago. AI evolves fast. Soon, we’ll see humanoids becoming faster, smarter, and more useful—augmenting human work rather than replacing it.
🌟 The future isn’t just digital. It’s physical AI in action.
Watch the video below: