Abstract
The primary paradigm for multi-task training in natural language processing is to represent the input with a shared pre-trained language model, and add a small, thin network (head) per task. Given an input, a target head is the head that is selected for outputting the final prediction. In this work, we examine the behaviour of non-target heads, that is, the output of heads when given input that belongs to a different task than the one they were trained for. We find that non-target heads exhibit emergent behaviour, which may either explain the target task, or generalize beyond their original task. For example, in a numerical reasoning task, a span extraction head extracts from the input the arguments to a computation that results in a number generated by a target generative head. In addition, a summarization head that is trained with a target question answering head, outputs query-based summaries when given a question and a context from which the answer is to be extracted. This emergent behaviour suggests that multi-task training leads to nontrivial extrapolation of skills, which can be harnessed for interpretability and generalization.
Original language | English |
---|---|
Title of host publication | EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 8201-8215 |
Number of pages | 15 |
ISBN (Electronic) | 9781955917094 |
State | Published - 2021 |
Externally published | Yes |
Event | 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021 |
Publication series
Name | EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings |
---|
Conference
Conference | 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 |
---|---|
Country/Territory | Dominican Republic |
City | Virtual, Punta Cana |
Period | 7/11/21 → 11/11/21 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics
Funding
We thank Ana Marasović and Daniel Khashabi for the helpful feedback and constructive suggestions, and the NLP group at Tel Aviv University, particularly Maor Ivgi and Elad Segal. This research was supported in part by The Yandex Initiative for Machine Learning, and The European Research Council (ERC) under the European Union Horizons 2020 research and innovation programme (grant ERC DELPHI 802800). This work was completed in partial fulfillment for the Ph.D degree of Mor Geva.
Funders | Funder number |
---|---|
European Union Horizons 2020 research and innovation programme | DELPHI 802800 |
Yandex Initiative for Machine Learning | |
European Commission |