Abstract
Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate examples. Having only a few workers generate the majority of examples raises concerns about data diversity, especially when workers freely generate sentences. In this paper, we perform a series of experiments showing these concerns are evident in three recent NLP datasets. We show that model performance improves when training with annotator identifiers as features, and that models are able to recognize the most productive annotators. Moreover, we show that often models do not generalize well to examples from annotators that did not contribute to the training set. Our findings suggest that annotator bias should be monitored during dataset creation, and that test set annotators should be disjoint from training set annotators.
Original language | English |
---|---|
Title of host publication | EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference |
Publisher | Association for Computational Linguistics |
Pages | 1161-1166 |
Number of pages | 6 |
ISBN (Electronic) | 9781950737901 |
State | Published - 2019 |
Event | 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 - Hong Kong, China Duration: 3 Nov 2019 → 7 Nov 2019 |
Publication series
Name | EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference |
---|
Conference
Conference | 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 |
---|---|
Country/Territory | China |
City | Hong Kong |
Period | 3/11/19 → 7/11/19 |
Bibliographical note
Funding Information:This research was partially supported by The Israel Science Foundation grant 942/16, The Blavat-nik Computer Science Research Fund and The Yandex Initiative for Machine Learning. We thank Sam Bowman from NYU University and Alon Talmor from Tel Aviv University for providing us the annotation information of the MNLI and COMMONSENSEQA datasets. This work was completed in partial fulfillment for the Ph.D degree of the first author.
Funding Information:
This research was partially supported by The Israel Science Foundation grant 942/16, The Blavatnik Computer Science Research Fund and The Yandex Initiative for Machine Learning. We thank Sam Bowman from NYU University and Alon Talmor from Tel Aviv University for providing us the annotation information of the MNLI and COMMONSENSEQA datasets. This work was completed in partial fulfillment for the Ph.D degree of the first author.
Publisher Copyright:
© 2019 Association for Computational Linguistics