MIT will train robots new skills using generative AI technology
Massachusetts Institute of Technology (MIT) last week unveiled a new method for training robots using generative artificial intelligence (AI) models. The new technique is based on combining data from different domains and modalities and unifying them into a shared language that can then be processed by large language models (LLMs). MIT researchers claim that this method could lead to general-purpose robots that can perform a wide range of tasks without having to train each skill separately.
MIT researchers develop AI-inspired technology to train robots
In a newsroom afterMIT has detailed the new methodology for training robots. Currently, teaching a robot a certain task is a difficult task, because a large amount of simulation and practical data is required. This is necessary because if the robot does not understand how to perform the task in a given environment, it will have difficulty adapting to it.
This means that every new task requires new data sets, covering every simulation and every real-world scenario. The robot then undergoes a training period during which its actions are optimized and errors and glitches are removed. As a result, robots are generally trained for a specific task, and the multi-functional robots seen in science fiction films are not seen in reality.
However, a new technique developed by researchers at MIT claims to circumvent this challenge. In one paper published in the pre-print online journal arXIv (note: it is not peer-reviewed), the scientists highlighted that generative AI can help with this problem.
To do this, data from different domains, such as simulations and real robots, and different modalities such as vision sensors and robot arm position encoders, were united into a shared language that can be processed by an AI model. A new architecture called Heterogeneous Pretrained Transformers (HPT) was also developed to unify the data.
Interestingly, the study’s lead author, Lirui Wang, an electrical engineering and computer science (EECS) graduate student, said the inspiration for this technique came from AI models such as OpenAI’s GPT-4.
The researchers added an LLM model called a transformer (similar to the GPT architecture) at the center of their system and it processes both vision and proprioception (sense of self-motion, force and position).
The MIT researchers say this new method could be faster and cheaper to train robots compared to traditional methods. This is largely due to the smaller amount of task-specific data required to train the robot in different tasks. Furthermore, the study showed that this method performed more than 20 percent better than training from scratch, in both simulation and real-world experiments.