In recent years, NLP has made what appears to be incredible progress, with performance even surpassing human performance on some benchmarks. How should we interpret these advances? Have these models achieved language “understanding”? Operating on the premise that “understanding” will necessarily involve the capacity to extract and deploy the information conveyed in language inputs — the “meaning” — in this talk I will discuss a series of projects leveraging targeted tests to examine NLP models’ ability to capture input meaning in a systematic fashion. I will first discuss work probing model representations for compositional phrase and sentence meaning, with a particular focus on disentangling compositional information from encoding of word-level properties. I’ll then explore models’ ability to extract and use meaning information when executing the basic pre-training task of word prediction in context. In all cases, these investigations apply tests that prioritize control of unwanted cues, so as to target the desired model capabilities with greater precision. The results of these studies suggest that although models show a good deal of sensitivity to word-level information, and to certain semantic and syntactic distinctions, they show little sign of representing higher-level compositional meaning, or of being able to retain and deploy such information robustly during word prediction. I will discuss potential implications of these findings with respect to the goals of achieving “understanding” with currently dominant pre-training paradigms.
Dr. Allyson Ettinger’s research is focused on language processing in humans and in artificial intelligence systems, motivated by a combination of scientific and engineering goals. For studying humans, her research uses computational methods to model and test hypotheses about mechanisms underlying the brain’s processing of language in real time. In the engineering domain, her research uses insights and methods from cognitive science, linguistics, and neuroscience in order to analyze, evaluate, and improve natural language understanding capacities in artificial intelligence systems. In both of these threads of research, the primary focus is on the processing and representation of linguistic meaning.