Self-normalizing discriminative models approximate the normalized probability of a class without having to compute the partition function. In the context of language modeling, this property is particularly appealing as it may significantly reduce run-times due to large word vocabularies. In this study, we provide a comprehensive investigation of language modeling self-normalization. First, we theoretically analyze the inherent self-normalization properties of Noise Contrastive Estimation (NCE) language models. Then, we compare them empirically to softmax-based approaches, which are self-normalized using explicit regularization, and suggest a hybrid model with compelling properties. Finally, we uncover a surprising negative correlation between self-normalization and perplexity across the board, as well as some regularity in the observed errors, which may potentially be used for improving self-normalization algorithms in the future.
|Title of host publication||COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings|
|Editors||Emily M. Bender, Leon Derczynski, Pierre Isabelle|
|Publisher||Association for Computational Linguistics (ACL)|
|Number of pages||10|
|State||Published - 2018|
|Event||27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States|
Duration: 20 Aug 2018 → 26 Aug 2018
|Name||COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings|
|Conference||27th International Conference on Computational Linguistics, COLING 2018|
|Period||20/08/18 → 26/08/18|
Bibliographical notePublisher Copyright:
© 2018 COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings. All rights reserved.