Abstract
Self-normalizing discriminative models approximate the normalized probability of a class without having to compute the partition function. In the context of language modeling, this property is particularly appealing as it may significantly reduce run-times due to large word vocabularies. In this study, we provide a comprehensive investigation of language modeling self-normalization. First, we theoretically analyze the inherent self-normalization properties of Noise Contrastive Estimation (NCE) language models. Then, we compare them empirically to softmax-based approaches, which are self-normalized using explicit regularization, and suggest a hybrid model with compelling properties. Finally, we uncover a surprising negative correlation between self-normalization and perplexity across the board, as well as some regularity in the observed errors, which may potentially be used for improving self-normalization algorithms in the future.
Original language | English |
---|---|
Title of host publication | COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings |
Editors | Emily M. Bender, Leon Derczynski, Pierre Isabelle |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 764-773 |
Number of pages | 10 |
ISBN (Electronic) | 9781948087506 |
State | Published - 2018 |
Event | 27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States Duration: 20 Aug 2018 → 26 Aug 2018 |
Publication series
Name | COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings |
---|
Conference
Conference | 27th International Conference on Computational Linguistics, COLING 2018 |
---|---|
Country/Territory | United States |
City | Santa Fe |
Period | 20/08/18 → 26/08/18 |
Bibliographical note
Publisher Copyright:© 2018 COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings. All rights reserved.