AI can decode speech from brain activity with surprising accuracy

Artificial intelligence can decode words and sentences from brain activity with surprising but still limited accuracy. Using just a few seconds of brain activity data, AI guesses what a person heard. It listed the correct answer in its top 10 possibilities up to 73 percent of the time, researchers found in a preliminary study.

“The performance of the AI ​​was beyond what many people thought possible at this stage,” said Giovanni Di Liberto, a computer scientist at Trinity College Dublin who was not involved in the research.

Developed by Facebook’s parent company Meta, artificial intelligence could eventually be used to help thousands of people around the world who cannot communicate through speech, writing or gestures, researchers reported Aug. 25 on arXiv.org . This includes many patients in minimally conscious, locked-in, or “vegetative states”—what is now commonly known as unresponsive wakefulness syndrome (SN: 2/8/19).

Most existing technologies to help such patients communicate require risky brain surgeries to implant electrodes. This new approach “could provide a viable way to help patients with communication deficits … without the use of invasive methods,” says neuroscientist Jean-Rémy King, a Meta AI researcher who is currently at the École Normale Supérieure in Paris.

King and his colleagues trained a computational tool to detect words and sentences on 56,000 hours of speech recordings from 53 languages. The tool, also known as a language model, learned how to recognize specific features of language both at a fine level – mental letters or syllables – and at a broader level, such as a word or sentence.

The team applied AI with this language model to databases from four institutions that included the brain activity of 169 volunteers. In these databases, participants listened to various stories and sentences by, for example, Ernest Hemingway The old man and the sea and Lewis Carroll Alices Adventures in Wonderland while people’s brains have been scanned using magnetoencephalography or electroencephalography. These techniques measure the magnetic or electrical component of brain signals.

Then, using a computational method that helps account for the physical differences between actual brains, the team tried to decode what the participants heard using just three seconds of brain activity data from each person. The team instructed the AI ​​to match the speech sounds from the story recordings to patterns of brain activity that the AI ​​calculated as matching what people were hearing. It then made predictions about what the person might have heard in that short time, given more than 1,000 possibilities.

Using magnetoencephalography, or MEG, the correct answer was in the AI’s top 10 guesses up to 73 percent of the time, the researchers found. With electroencephalography, this value dropped to no more than 30 percent. “[That MEG] the performance is very good,” says Di Liberto, but he is not so optimistic about its practical use. “What can we do with it? Nothing. Absolutely nothing.”

The reason, he says, is that MEG requires a bulky and expensive machine. Bringing this technology into clinics will require scientific innovation that makes the machines cheaper and easier to use.

It’s also important to understand what “decoding” actually means in this research, says Jonathan Brennan, a linguist at the University of Michigan in Ann Arbor. The word is often used to describe the process of deciphering information directly from a source—in this case, speech from brain activity. But the AI ​​could only do this because it was given a finite list of possible correct answers to make its guesses.

“With language, it’s not going to help if we want to scale it for practical use, because language is infinite,” says Brennan.

What’s more, Di Liberto says, the AI ​​decodes information about participants passively listening to audio that isn’t directly related to nonverbal patients. For it to become a meaningful communication tool, scientists will need to learn how to decipher from brain activity what these patients intend to say, including expressions of hunger, discomfort or a simple yes or no.

The new research is “decoding speech perception, not production,” King agrees. Although speech production is the ultimate goal, for now “we’re a long way off.”

Leave a Comment