First align, then predict: Understanding the cross-lingual ability of multilingual BERT

Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

47 Scopus citations

Abstract

Multilingual pretrained language models have demonstrated remarkable zero-shot cross-lingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. Despite promising results, we still lack a proper understanding of the source of this transfer. Using a novel layer ablation technique and analyses of the model's internal representations, we show that multilingual BERT, a popular multilingual language model, can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be reinitialized during fine-tuning. We present extensive experiments with three distinct tasks, seventeen typologically diverse languages and multiple domains to support our hypothesis.

Original languageEnglish
Title of host publicationEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages2214-2231
Number of pages18
ISBN (Electronic)9781954085022
StatePublished - 2021
Event16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 - Virtual, Online
Duration: 19 Apr 202123 Apr 2021

Publication series

NameEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021
CityVirtual, Online
Period19/04/2123/04/21

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics

Funding

We want to thank Hila Gonen, Shauli Ravfogel and Ganesh Jawahar for their insightful reviews and comments. We also thank the anonymous reviewers for their valuable suggestions. This work was partly funded by two French National funded projects granted to Inria and other partners by the Agence Nationale de la Recherche, namely projects PAR-SITI (ANR-16-CE33-0021) and SoSweet (ANR-15-CE38-0011), as well as by the third author’s chair in the PRAIRIE institute funded by the French national agency ANR as part of the “Investisse-ments d’avenir” programme under the reference ANR-19-P3IA-0001. Yanai Elazar is grateful to be partially supported by the PBC fellowship for outstanding Phd candidates in Data Science.

FundersFunder number
French national agency ANRANR-19-P3IA-0001
SoSweetANR-15-CE38-0011
Agence Nationale de la RechercheANR-16-CE33-0021
Planning and Budgeting Committee of the Council for Higher Education of Israel

    Fingerprint

    Dive into the research topics of 'First align, then predict: Understanding the cross-lingual ability of multilingual BERT'. Together they form a unique fingerprint.

    Cite this