Visualize this: self-driving cars on busy city streets, skillfully avoiding obstacles and giving way to pedestrians; warehouse robots shuttling freely in large logistics centers, accurately picking, sorting, and placing items, efficiently and accurately. These sights of human-machine collaboration are familiar to people in 2024, but are merely a glimpse of how artificial intelligence is transforming the future world.
As Jensen Huang, the CEO of NVIDIA says: “The next wave of AI will be physical AI. AI that understands the laws of physics, AI that can work among us.” This is the concept of “embodied AI robotics.” Simply put, robots can act like humans who observe their surroundings and environment, reason within the complex physical world, and respond correctly in real time.
Huang isn’t alone in his optimism. Elon Musk, the CEO of Tesla, also has high hopes. In the first half of 2024, Tesla officially released the impressive humanoid robot Optimus Gen 2, which can walk, squat, carry heavy objects, hold eggs lightly, and even dance.
In the future, everything will be able to be done by robots, from cleaning and cooking to industrial manufacturing. This is an industrial revolution worth hundreds of billions of dollars. Traditional robots will be surpassed, and AI robots will be introduced into every industry. This new future of human-machine collaboration may soon be here.
At the center of this revolution is an American startup called Hillbot.
Hillbot: A team of AI pioneers and entrepreneurial veterans
Although Hillbot is still a new company, it is led by a group of AI pioneers and entrepreneurial veterans. They are working together to develop cutting-edge embodied AI technology.
Robin Han, Hillbot co-founder and CEO, is a seasoned entrepreneur. The two companies he previously founded were both acquired by US listed companies. Robin has 40 patents related to the Internet of Things, computer vision, natural language processing, and computer systems.
Another core member of Hillbot is its co-founder and Chief Technical Officer, Hao Su. He is an associate professor of Computer Science and Engineering at the University of California, San Diego. He is also a thought leader and an iconic figure in AI. Hao said: “My interest in AI began in middle school, when I came across Minimum Spanning Tree Algorithms … That was the first time I felt that human intelligence might not be so unique, since it could be by machines.”
In 2008, when Hao was studying for his PhD at Stanford University, he worked on ImageNet, the world’s largest image recognition database, and was one of the main contributors. Thisproject revolutionized the field of computer vision. “Before ImageNet, people used to think it would take 200 years to crack computer vision. We’ve made amazing progress in just 10 years.”
In 2013, Hao entered the field of 3D deep learning to solve problems in 3D perception and data collection, and later applied these technologies to autonomous driving, computer vision, and robotics. Subsequently, Hao led two breakthrough projects: ShapeNet and PointNet. The former is the “3D data version of ImageNet,” and the latter is the key perception backbone of robots. Each project contributed greatly to AI systems and became solid foundations for Hillbot.
From soccer to robots: The intersection of perception, cognition, and action
When Hao was sharing the latest embodied AI trends with students at MIT, he said:
“I loved soccer as a kid.” Hao pointed to an image of two children on the projection screen. “I was amazed by how players could make curveballs happen.”
“However, watching videos cannot make you a good soccer player…. because the act of playing soccer is an intertwined process of perception, cognition, and action.”
He elaborated further: “It’s not only that perception provides the grounds for performing actions; there is the feedback. You will adapt your perception, and you will even define the concept perceived based upon the interaction.”
HaoSu concluded: “This is why intelligence not only relies on the brain but is inseparable from the interaction of the body and the world. The close integration of the three elements of perception, cognition, and action is the key to intelligent progress.”
This process is the core of how AI robots learn and interact in the real world.
Why are AI robots so difficult to train? The lack of data is the biggest problem
Classical robotics emerged in the past 10 years. However, conventional robots rely on rigid if/then programming to execute pre-written instructions. In classical systems, robots are limited to repetitive, pre-programmed motions with little flexibility, and the complexity of tasks they can handle is constrained. Embodied AI robots, on the other hand, are trained on real-world data. Using reinforcement learning, they can complete various tasks in different environments like humans.
In recent years, technologies such as perception and control systems driven by AI have made significant leaps, pushing embodied AI to the mass consumer market. From a macroeconomic perspective, labor shortages have led many companies to seek different solutions. Data indicates that warehousing and retail still require a huge amount of labor, and there are more than half a million job vacancies in the US manufacturing industry. Embodied AI robots can help humans free their hands from repetitive, high-risk tasks.
However, developing robots to tackle complex tasks is not easy. Hao explained that, just like the soccer example above, it is impossible for robots to become soccer masters after watching a video. Instead, they need abundant, high-quality 3D data from the real world, including touch, weight, pressure, and texture.
But here comes the problem. For internet companies, scraping online data is simple, but for robots, physical data is difficult to obtain, since the process is time consuming and the costs are high.
Apart from difficulties in data collection, accuracy is also a problem. Hao said that current robots can achieve 99% accuracy in specific tasks. This seems high, but small differences can cause major problems. If such robots are put into the manufacturing industry for scale, even a 1% error is unacceptable, especially when humans can achieve nearly 100% with minimal training.
How does Hillbot do it?
How does Hillbot solve these complex issues? Hao said that Hillbot’s team has developed a comprehensive set of training methods for embodied AI robots. First, when solving problems for data collection, Hillbot utilizes the most advanced 3D generative AI technology. People are more familiar with 2D generative AI platforms such as Midjourney and Stable Diffusion. Hillbot, however, developed a 3D generative AI tool. Users can easily generate 3D objects with text prompts.
For example, if anyone wants to train a robot to arrange chairs, Hillbot can generate 10,000 different chairs using simple text prompts to help the robot adapt to various designs. To serve a specific purpose, Hillbot can also take photos of a space and convert 2D images into actionable 3D geometric models in virtual environments.
Hillbot can then place the generated 3D objects into a proprietary simulator, SAPIEN, to create interactive scenes for training. SAPIEN is the fastest and highest-performing simulator for robot manipulation currently on the market. Through highly- realistic simulation technology, Hillbot can increase robot training speed fivefold, as well as shorten training time from 12 months to just a few months , far exceeding mainstream methods on the market. According to Su, their fast-paced development is possible only because of Hillbot’s unique expertise in generating simulation data, which helps them avoid high costs and lengthy training processes.
It helps that they split complex tasks into various simple tasks, so that the robot can learn to reason step by step and gradually adapt to more complex and open-ended tasks. Hao emphasized that these robot-training methods are significant, since they enable Hillbot to build general-purpose, highly-skilled robots capable of performing complex tasks in the real world.
Hillbot is currently focused on industrial applications such as automobile manufacturing, warehousing, and retail. Hillbot is in talks with automakers about using robots to inspect parts of the assembly, and is also looking for opportunities to integrate robots in the retail industry for efficient sorting solutions. Hillbot will integrate software with existing third-party hardware to quickly enter the market, and will subsequently launch proprietary robots.
Reshaping the future of human-machine collaboration
“Will humans be replaced by AI and robots?” After the emergence of generative AI, this question has become one of the most urgent issues we face in society. In this regard, Su believes that the rise of AI will indeed automate many low-skilled and repetitive tasks. People will be expected to escape monotonous and dangerous work and improve productivity and quality of life. At the same time, embodied AI will also create new roles. New opportunities will emerge in the future, and he expects the creation of millions of new jobs.
“We are at a critical moment in embodied AI,” Hao said. In the near future, robots may be seen in factories, construction sites, warehouses, roads, and homes. As technological barriers continue to be overcome, the future will not only be about humanity but also about the coexistence of humans and robots. This revolution has only just begun.
This article has been contributed to AsiaTechDaily.