Abstract
The 2022 SIGMORPHON-UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe. We emphasize generalization along different dimensions this year by evaluating test items with unseen lemmas and unseen features separately under small and large training conditions. Across the six submitted systems and two baselines, the prediction of inflections with unseen features proved challenging, with average performance decreased substantially from last year. This was true even for languages for which the forms were in principle predictable, which suggests that further work is needed in designing systems that capture the various types of generalization required for the world's languages.
Original language | English |
---|---|
Title of host publication | SIGMORPHON 2022 - 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, Proceedings of the Workshop |
Editors | Garrett Nicolai, Eleanor Chodroff |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 176-203 |
Number of pages | 28 |
ISBN (Electronic) | 9781955917827 |
State | Published - 2022 |
Event | 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, SIGMORPHON 2022 - Seattle, United States Duration: 14 Jul 2022 → … |
Publication series
Name | SIGMORPHON 2022 - 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, Proceedings of the Workshop |
---|
Conference
Conference | 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, SIGMORPHON 2022 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 14/07/22 → … |
Bibliographical note
Publisher Copyright:© 2022 Association for Computational Linguistics.
Funding
We would like to thank Garrett Nicolai, Maria Ryskina, Ben Ambridge, Jeff Heinz, and all those who provided valuable advice and logistical support in the early stages of this project, Judit Ács, Duygu Ataman, Zigniew Bronk, Eleanor Chodroff, Sofya Ganieva, Włodzimierz Gruszczyński, Nizar Habash, Jan Hajicˇ, Jan Hric, Ritvan Karahodja, Christo Kirov, Elena Klyachko, Ritesh Kumar, Va-hagn Petrosyan, Matvey Plugaryov, Mohit Raj, Maria Ryskina, Elizabeth Salesky, Zygmunt Saloni, Danuta Skowrońska, Marcin Woliński, and Robert Wołosz, who prepared and authored lexicons used in this project, as well as Jeff Heinz, Sarah Payne, and Charles Yang, who provided feedback on this overview paper. The neural baseline system was trained on the SeaWulf HPC cluster maintained by RCC, and IACS at Stony Brook University and made possible by NSF grant #1531492.
Funders | Funder number |
---|---|
National Science Foundation | 1531492 |