Abstract
We show that the choice of pretraining languages affects downstream cross-lingual transfer for BERT-based models. We inspect zeroshot performance in balanced data conditions to mitigate data size confounds, classifying pretraining languages that improve downstream performance as donors, and languages that are improved in zero-shot performance as recipients. We develop a method of quadratic time complexity in the number of languages to estimate these relations, instead of an exponential exhaustive computation of all possible combinations. We find that our method is effective on a diverse set of languages spanning different linguistic features and two downstream tasks. Our findings can inform developers of large-scale multilingual language models in choosing better pretraining configurations.
Original language | English |
---|---|
Title of host publication | NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics |
Subtitle of host publication | Human Language Technologies, Proceedings of the Conference |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 4903-4915 |
Number of pages | 13 |
ISBN (Electronic) | 9781955917711 |
State | Published - 2022 |
Externally published | Yes |
Event | 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 - Seattle, United States Duration: 10 Jul 2022 → 15 Jul 2022 |
Publication series
Name | NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference |
---|
Conference
Conference | 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 10/07/22 → 15/07/22 |
Bibliographical note
Publisher Copyright:© 2022 Association for Computational Linguistics.
Funding
We would like to thank Roy Schwartz for his helpful comments and suggestions and the anonymous reviewers for their valuable feedback. This work was supported in part by a research gift from the Allen Institute for AI. Tomasz Limisiewicz’s visit to the Hebrew University has been supported by grant 338521 of the Charles University Grant Agency and the Mobility Fund of Charles University.
Funders | Funder number |
---|---|
Mobility Fund of Charles University | |
Grantová Agentura, Univerzita Karlova |