Are we modeling the task or the annotator? An investigation of annotator bias in natural language understanding datasets

Mor Geva, Yoav Goldberg, Jonathan Berant

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

127 Scopus citations

Abstract

Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate examples. Having only a few workers generate the majority of examples raises concerns about data diversity, especially when workers freely generate sentences. In this paper, we perform a series of experiments showing these concerns are evident in three recent NLP datasets. We show that model performance improves when training with annotator identifiers as features, and that models are able to recognize the most productive annotators. Moreover, we show that often models do not generalize well to examples from annotators that did not contribute to the training set. Our findings suggest that annotator bias should be monitored during dataset creation, and that test set annotators should be disjoint from training set annotators.

Original languageEnglish
Title of host publicationEMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics
Pages1161-1166
Number of pages6
ISBN (Electronic)9781950737901
StatePublished - 2019
Event2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 - Hong Kong, China
Duration: 3 Nov 20197 Nov 2019

Publication series

NameEMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference

Conference

Conference2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019
Country/TerritoryChina
CityHong Kong
Period3/11/197/11/19

Bibliographical note

Funding Information:
This research was partially supported by The Israel Science Foundation grant 942/16, The Blavat-nik Computer Science Research Fund and The Yandex Initiative for Machine Learning. We thank Sam Bowman from NYU University and Alon Talmor from Tel Aviv University for providing us the annotation information of the MNLI and COMMONSENSEQA datasets. This work was completed in partial fulfillment for the Ph.D degree of the first author.

Funding Information:
This research was partially supported by The Israel Science Foundation grant 942/16, The Blavatnik Computer Science Research Fund and The Yandex Initiative for Machine Learning. We thank Sam Bowman from NYU University and Alon Talmor from Tel Aviv University for providing us the annotation information of the MNLI and COMMONSENSEQA datasets. This work was completed in partial fulfillment for the Ph.D degree of the first author.

Publisher Copyright:
© 2019 Association for Computational Linguistics

Fingerprint

Dive into the research topics of 'Are we modeling the task or the annotator? An investigation of annotator bias in natural language understanding datasets'. Together they form a unique fingerprint.

Cite this