Abstract
Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.
| Original language | English |
|---|---|
| Title of host publication | EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference |
| Publisher | Association for Computational Linguistics |
| Pages | 6330-6335 |
| Number of pages | 6 |
| ISBN (Electronic) | 9781950737901 |
| DOIs | |
| State | Published - 2019 |
| Event | 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 - Hong Kong, China Duration: 3 Nov 2019 → 7 Nov 2019 |
Publication series
| Name | EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference |
|---|
Conference
| Conference | 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 |
|---|---|
| Country/Territory | China |
| City | Hong Kong |
| Period | 3/11/19 → 7/11/19 |
Bibliographical note
Publisher Copyright:© 2019 Association for Computational Linguistics
Funding
Maria Barrett is sponsored by a Facebook Research Award; Anders Søgaard is sponsored by a Facebook Research Award and a Google Focused Research Award.
| Funders |
|---|
Fingerprint
Dive into the research topics of 'Adversarial removal of demographic attributes revisited'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver