Abstract
While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to failure on challenging inputs. We analyze the use of lexical overlap heuristics in natural language inference, paraphrase detection, and reading comprehension (using a novel contrastive dataset), and find that larger models are much less susceptible to adopting lexical overlap heuristics. We also find that longer training leads models to abandon lexical overlap heuristics. Finally, we provide evidence that the disparity between models size has its source in the pre-trained model.
| Original language | English |
|---|---|
| Title of host publication | Findings of the Association for Computational Linguistics |
| Subtitle of host publication | EMNLP 2022 |
| Editors | Yoav Goldberg, Zornitsa Kozareva, Yue Zhang |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 4427-4439 |
| Number of pages | 13 |
| ISBN (Electronic) | 9781959429432 |
| DOIs | |
| State | Published - 2022 |
| Event | 2022 Findings of the Association for Computational Linguistics: EMNLP 2022 - Hybrid, Abu Dhabi, United Arab Emirates Duration: 7 Dec 2022 → 11 Dec 2022 |
Publication series
| Name | Findings of the Association for Computational Linguistics: EMNLP 2022 |
|---|
Conference
| Conference | 2022 Findings of the Association for Computational Linguistics: EMNLP 2022 |
|---|---|
| Country/Territory | United Arab Emirates |
| City | Hybrid, Abu Dhabi |
| Period | 7/12/22 → 11/12/22 |
Bibliographical note
Publisher Copyright:© 2022 Association for Computational Linguistics.
Fingerprint
Dive into the research topics of 'Lexical Generalization Improves with Larger Models and Longer Training'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver