The news is by your side.

The race to make AI smaller (and smarter).

0

When it comes to artificial intelligence chatbots, bigger is usually better.

Large language models like ChatGPT and Bard, which generate conversational, original text, improve as they get more data. Every day, bloggers take to the internet to explain how the latest advances – an app that summarizes articles, AI-generated podcasts, a sophisticated model that can answer any question related to professional basketball – “will change everything.”

But to create bigger and more capable AI, you need processing power that few companies have, and there is growing concern that a small group, including Google, Meta, OpenAI and Microsoft, will exert almost complete control over the technology.

Larger language models are also more difficult to understand. They are often described as “black boxes” even by the people who design them, and leading figures in the field have expressed unease that AI’s goals ultimately don’t align with ours. If bigger is better, it will also be more opaque and exclusive.

In January, a group of young academics working in natural language processing – the branch of AI that focuses on linguistic understanding – challenged to try to turn this paradigm on its head. The group called for teams to create functional language models using datasets less than one-ten-thousandth the size of those used by the most sophisticated large language models. A successful mini model would be nearly as capable as the high-end models, but much smaller, more accessible, and more compatible with humans. The project is called the BabyLM Challenge.

“We challenge people to think small and focus more on building efficient systems that more people can use,” said Aaron Mueller, a computer scientist at Johns Hopkins University and organizer of BabyLM.

Alex Warstadt, a computer scientist at ETH Zurich and another organizer of the project, added: “The challenge asks questions about human language learning, rather than ‘How big can we make our models?’ central to the conversation.”

Large language models are neural networks designed to predict the next word in a given phrase or sentence. They are trained for this task using a corpus of words collected from transcripts, websites, novels and newspapers. A typical model makes guesses based on sample sentences and then adjusts itself depending on how close it gets to the correct answer.

By repeating this process over and over, a model forms maps of how words relate to each other. In general, the more words a model learns, the better it gets; each sentence provides the model with context, and more context translates into a more detailed impression of what each word means. OpenAI’s GPT-3, released in 2020, is trained to 200 billion words; DeepMind’s Chinchilla, released in 2022, was trained on a trillion.

For Ethan Wilcox, a linguist at ETH Zurich, the fact that something non-human can generate language presents an exciting opportunity: Could AI language models be used to study how humans learn language?

For example, nativism, an influential theory dating back to the early work of Noam Chomsky, claims that people learn language quickly and efficiently because they have an innate understanding of how language works. But language models also learn language quickly, and seemingly without an innate understanding of how language works – so maybe nativism doesn’t hold water.

The challenge is that language models learn very differently from humans. Humans have bodies, social lives and rich sensations. We can smell mulch, feel the blades of feathers, bump into doors, and taste peppermint. Early on we are exposed to simple spoken words and syntax that are often not reflected in writing. So, Dr. Wilcox concluded, a computer that produces language after being trained on trillions of written words can only tell us so much about our own language process.

But if a language model were exposed only to words a young human encounters, it could interact with language in a way that could answer certain questions about our own abilities.

So came up with Dr. Wilcox, mr. Mueller and Dr. Warstadt and half a dozen colleagues launched the BabyLM Challenge, to try to bring language models a little closer to human understanding. In January, they sent out a call for teams to train language models on the same number of words a 13-year-old human encounters — about 100 million. Candidate models would be tested on how well they generated and picked up language nuances, and a winner would be declared.

Eva Portelance, a linguist at McGill University, ran into the challenge the day it was announced. Her research straddles the often blurred boundary between computer science and linguistics. The first forays into AI, in the 1950s, were driven by a desire to model human cognitive abilities in computers; the basic unit of information processing in AI is ‌the‌ “neuron‌”, and early language models in the 1980s and 1990s were directly inspired by the human brain. ‌

But as processors became more powerful and companies began working on marketable products, computer scientists realized that it was often easier to train language models on massive amounts of data than to force them into psychologically informed structures. As a result, said Dr. Portelance, “‌they give us text that’s human, but there’s no connection between us and how they function‌.”‌

For scientists interested in understanding how the human mind works, these large models offer limited insight. And because they require enormous processing power, few researchers have access to them. “Only a small number of industrial labs with huge resources can afford to train models with billions of parameters on trillions of words,” says Dr. Wilcox said.

“Or even to load them,” added Mr. Mueller. “This makes research in the field feel a little less democratic lately.”

The BabyLM challenge, said Dr. Portelance, could be seen as a move away from the arms race towards bigger language models, and a step towards more accessible, more intuitive AI

The potential of such a research program has not been ignored by larger industrial laboratories. Sam Altman, the CEO of OpenAI, said recently that increasing the size of language models would not lead to the same kind of improvements as in recent years. And companies like Google and Meta have also invested in research into more efficient language models based on human cognitive structures. After all, a model that can generate language when trained on less data can also be scaled up.

Whatever gains a successful BabyLM may bring, for those behind the challenge, the goals are more academic and abstract. Even the price undermines the practicality. “Just proud,” Dr. Wilcox said.

Leave A Reply

Your email address will not be published.