The news is by your side.

How the AI ​​powering ChatGPT will enter the physical world

0

Companies like OpenAI and Midjourney are building chatbots, image generators and other artificial intelligence tools that operate in the digital world.

Now, a startup founded by three former OpenAI researchers is using the technology development methods behind chatbots to build AI technology that can navigate the physical world.

Covariant, a robotics company headquartered in Emeryville, California, is creating ways for robots to pick, move and sort items as they move through warehouses and distribution centers. The goal is to help robots understand what’s happening around them and decide what to do next.

The technology also gives robots a broad understanding of the English language, allowing people to chat with them as if they were chatting with ChatGPT.

The technology, which is still developing, is not perfect. But it’s a clear sign that the artificial intelligence systems that power online chatbots and image generators will also power machines in warehouses, on roads and in homes.

Like chatbots and image generators, this robotics technology learns its skills by analyzing vast amounts of digital data. That means engineers can improve the technology by adding more and more data to it.

Covariant, backed by $222 million in funding, doesn’t build robots. It builds the software that powers robots. The company wants to deploy its new technology with warehouse robots, providing a roadmap for others to do much the same in factories and perhaps even on roads with self-driving cars.

The AI ​​systems that power chatbots and image generators are called neural networks, named after the web of neurons in the brain.

By identifying patterns in large amounts of data, these systems can learn to recognize words, sounds and images – or even generate them themselves. This is how OpenAI built ChatGPT, giving it the ability to instantly answer questions, write term papers, and generate computer programs. It learned these skills from text pulled from the Internet. (Several media outlets, including The New York Times, have sued OpenAI for copyright infringement.)

Companies are now building systems that can learn from different types of data simultaneously. For example, by analyzing both a collection of photos and the captions that describe those photos, a system can understand the relationships between the two. He can learn that the word “banana” describes a curved yellow fruit.

OpenAI used that system to build Sora, the new video generator. By analyzing thousands of subtitled videos, the system learned to generate videos when given a brief description of a scene, such as “a beautifully rendered paper world of a coral reef, full of colorful fish and sea creatures.”

Covariant, founded by Pieter Abbeel, a professor at the University of California, Berkeley, and three of his former students, Peter Chen, Rocky Duan and Tianhao Zhang, used similar techniques to build a system that controls warehouse robots.

The company helps operate sorting robots in warehouses around the world. It has spent years collecting data – from cameras and other sensors – that shows how these robots work.

“It records all kinds of data that are important to robots – which can help them understand and interact with the physical world,” said Dr. Chen.

By combining that data with the vast amounts of text used to train chatbots like ChatGPT, the company has built AI technology that gives its robots a much broader understanding of the world around them.

After identifying patterns in this stew of images, sensory data and text, the technology gives a robot the power to deal with unexpected situations in the physical world. The robot knows how to pick up a banana, even though it has never seen a banana before.

It can also respond to plain English, just like a chatbot. If you tell him to ‘pick up a banana’, he knows what that means. If you tell him to ‘pick up a yellow fruit’, he will understand.

It can even generate videos predicting what is likely to happen if it tries to pick up a banana. These videos have no practical use in a warehouse, but show how the robot understands what is happening around it.

“If it can predict the next frames in a video, it can determine the right strategy to follow,” said Dr. Abbeel.

The technology, called RFM, for the fundamental model of robotics, makes mistakes, just like chatbots do. Although he often understands what people ask of him, there is always a chance that he doesn’t. Occasionally he drops objects.

Gary Marcus, an AI entrepreneur and professor emeritus of psychology and neural sciences at New York University, said the technology could be useful in warehouses and other situations where errors are acceptable. But he said it would be more difficult and risky to deploy in factories and other potentially dangerous situations.

“It comes down to the cost of mistakes,” he said. “If you have a £150 robot that can do something harmful, those costs can be high.”

As companies train these types of systems on increasingly large and varied data sets, researchers believe this will improve quickly.

That is very different from the way robots used to work. Typically, engineers would program robots to perform the same precise motion over and over again, such as picking up a box of a certain size or attaching a rivet to a specific spot on the rear bumper of a car. But robots could not deal with unexpected or random situations.

By learning from digital data – hundreds of thousands of examples of what’s happening in the physical world – robots can deal with the unexpected. And when those examples are combined with language, robots can also respond to text and voice suggestions, just like a chatbot would.

This means that robots, just like chatbots and image generators, will become more agile.

“What’s in the digital data can be transferred to the real world,” said Dr. Chen.

Leave A Reply

Your email address will not be published.