What's in Your Head? Emergent Behaviour in Multi-Task Transformer Models

Mor Geva, Uri Katz, Aviv Ben-Arie, Jonathan Berant

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

The primary paradigm for multi-task training in natural language processing is to represent the input with a shared pre-trained language model, and add a small, thin network (head) per task. Given an input, a target head is the head that is selected for outputting the final prediction. In this work, we examine the behaviour of non-target heads, that is, the output of heads when given input that belongs to a different task than the one they were trained for. We find that non-target heads exhibit emergent behaviour, which may either explain the target task, or generalize beyond their original task. For example, in a numerical reasoning task, a span extraction head extracts from the input the arguments to a computation that results in a number generated by a target generative head. In addition, a summarization head that is trained with a target question answering head, outputs query-based summaries when given a question and a context from which the answer is to be extracted. This emergent behaviour suggests that multi-task training leads to nontrivial extrapolation of skills, which can be harnessed for interpretability and generalization.

Original languageEnglish
Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages8201-8215
Number of pages15
ISBN (Electronic)9781955917094
StatePublished - 2021
Externally publishedYes
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/TerritoryDominican Republic
CityVirtual, Punta Cana
Period7/11/2111/11/21

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics

Funding

We thank Ana Marasović and Daniel Khashabi for the helpful feedback and constructive suggestions, and the NLP group at Tel Aviv University, particularly Maor Ivgi and Elad Segal. This research was supported in part by The Yandex Initiative for Machine Learning, and The European Research Council (ERC) under the European Union Horizons 2020 research and innovation programme (grant ERC DELPHI 802800). This work was completed in partial fulfillment for the Ph.D degree of Mor Geva.

FundersFunder number
European Union Horizons 2020 research and innovation programmeDELPHI 802800
Yandex Initiative for Machine Learning
European Commission

    Fingerprint

    Dive into the research topics of 'What's in Your Head? Emergent Behaviour in Multi-Task Transformer Models'. Together they form a unique fingerprint.

    Cite this