Teaching Language Models Grammar Really Makes Them Smarter | MIT News

Voice assistants like Siri and Alexa can tell the weather and crack a good joke, but any 8-year-old can have a better conversation.

The deep learning patterns that power Siri and Alexa learn to understand our commands by picking patterns from sequences of words and phrases. Their narrow, statistical understanding of language stands in stark contrast to our own creative and spontaneous ways of speaking, a skill that begins to develop even before we are born, while we are still in the womb.

To give computers some of our innate sense of language, researchers have started training deep learning models on the grammatical rules that most of us grasp intuitively, even though we were never taught how to diagram. a sentence at school. Grammatical constraints seem to help models learn faster and perform better, but since neural networks reveal very little about their decision-making process, researchers have struggled to confirm that the gains are due to grammar, not the patterns’ expert ability to find patterns in sequences of words.

Now psycholinguists have stepped in to help. To peer inside the models, the researchers took psycholinguistic tests originally developed to study human language comprehension and adapted them to probe what neural networks know about language. In a pair of articles that will be presented in June at the North American Chapter of the Association for Computational Linguistics conference, researchers from MIT, Harvard University, University of California, IBM Research, and Kyoto University developed a set of tests to untangle models’ knowledge of grammatical rules specific. They find evidence that grammar-enriched deep learning models understand some fairly sophisticated rules, perform better than models trained on little or no grammar, and use a fraction of the data.

“The grammar helps the model behave in a more human way,” says Miguel Ballesterosan IBM researcher with the MIT-IBM Watson AI Lab, and co-author of both studies. “Sequential models don’t seem to care if you end a sentence with an ungrammatical expression. Why? Because they don’t see this hierarchy.

As a post-doctoral fellow at Carnegie Mellon University, Ballesteros helped develop a modern language model training method on sentence structure called recurrent neural network grammars, or RNNG. In current research, he and his colleagues have exposed the RNNG pattern, and similar patterns with little or no grammatical training, to sentences with good, bad, or ambiguous syntax. When human subjects are asked to read sentences with faulty grammar, their surprise is registered by longer response times. For computers, surprise is expressed in probabilities; when low-probability words appear instead of high-probability words, researchers give models a higher surprise score.

They found that the best-performing model—the grammar-enriched RNNG model—was more surprising when exposed to grammatical anomalies; for example, when the word “that” incorrectly appears instead of “what” to introduce an embedded clause; “I know what the lion devoured at sunrise” is a perfectly natural phrase, but “I know the lion devoured at sunrise” sounds like something is missing – because it is. .

Linguists call this type of construction a dependency between a padding (a word like who or what) and a gap (the absence of a sentence where one is usually required). Even when more complicated constructs of this type are presented to grammar-enriched models, they – like native English speakers – clearly know which ones are wrong.

For example, “The policeman who shot the politician with his gun shocked him during the trial” is abnormal; the space corresponding to the filling “which” should come after the verb “to shoot”, not “shocked”. Rewrite the sentence to change the position of the gap, as in “The policeman who the criminal fired at shocked the jury during the trial”, is long, but perfectly grammatical.

“Without being trained on tens of millions of words, state-of-the-art sequential models don’t care where the gaps are and aren’t in sentences like these,” says Roger Levyprofessor at MIT Department of Brain and Cognitive Sciences, and co-author of both studies. “A human would find that really weird, and apparently the grammar-enriched patterns too.”

Bad grammar, of course, not only looks weird, but can turn an entire sentence into gibberish, emphasizing the importance of syntax in cognition, and for psycholinguists who study syntax to learn more about the brain’s ability to think symbolically. is important for understanding the meaning of the sentence and how to interpret it,” says Peng QianMIT graduate student and co-author of both studies.

The researchers plan to then run their experiments on larger data sets and find out if grammar-enriched models learn new words and phrases faster. Just as subjecting neural networks to psychological testing helps AI engineers understand and improve patterns of language, psychologists hope to use this information to build better models of the brain.

“Certain components of our genetic endowment give us this rich ability to speak,” says Ethan Wilcox, a Harvard graduate student and co-author of both studies. “These are the kinds of methods that can provide insight into how we learn and understand language when our closest relatives cannot.”

Comments are closed.