December 23, 2024

The ‘Year of the Robot’ Has Arrived

In October, Tesla unveiled its latest humanoid robot, Optimus, at the “We, Robot” conference, igniting widespread discussions and heralding the arrival of the “Robot Era.” However, reports later revealed that some of the robot’s movements still required human remote control, suggesting that advancements in robotics have yet to meet public expectations.

AI has already exhibited capabilities surpassing human expertise in various professional domains, such as excelling in complex exams, solving mathematical problems, and even taking over certain white-collar roles. Yet, tasks as simple as those performed by a four-year-old—like holding a pen or gently handling an egg—remain significant challenges for robots.

A promising new research avenue in robotics, known as Embodied AI (“AI with a body”), seeks to tackle these challenges.

Traditional robot training relies on predefined rules and instructions to complete tasks. While effective for repetitive actions, this method struggles to adapt to unfamiliar tasks or environmental changes. For instance, a traditional robot might navigate to a designated location based on programming but falters when faced with unexpected obstacles or new scenarios.

Embodied AI, by contrast, emphasizes “learning through doing,” enabling robots to modify their behavior based on environmental feedback—much like how children learn. For example, when training a robot to pick up a cup, it first identifies the cup’s position using its camera. If it fails initially, it adjusts the grip’s force and angle through its tactile sensors until successful.

This iterative approach allows Embodied AI to swiftly adapt to shifting environments, making it more flexible than traditional robots. With its potential to autonomously learn new tasks in increasingly complex scenarios, Embodied AI represents a crucial step toward creating intelligent, general-purpose robots.

Nevertheless, substantial volumes of training data remain essential for developing human-like “reasoning” and “adaptation” capabilities. For instance, ChatGPT is estimated to have been trained on approximately 400 billion characters of text, while the image-generation model Midjourney utilized around 6 billion image-text data pairs. In stark contrast, DeepMind’s open-source robotics database contains only about 2.4 million data points—insufficient to train a truly intelligent general-purpose robot.

To address this shortfall, startups are pursuing innovative solutions. Hillbot, for example, uses 3D simulation technology to create virtual environments, enabling robots to “learn” how to handle complex scenarios virtually.

Consider training a robot to arrange chairs of various shapes. Hillbot can generate tens of thousands of chair designs using basic text commands, helping the robot master diverse arrangements. To familiarize a robot with a specific location, such as a coffee shop or warehouse, Hillbot can also convert site photos into 3D virtual models, providing realistic environments for practice.

Robots have yet to experience their “ChatGPT moment,” but as the technology matures, we will witness their gradual transition from industrial and service contexts into everyday household settings. One day, robots may seamlessly integrate into our lives—and perhaps even become our most trusted companions.

Recent Articles

See All