ABI neural ensemble model for gender prediction adapt Bar-ilan submission for the Clin29 shared task on gender prediction

Eva Vanmassenhove, Amit Moryossef, Alberto Poncelas, Andy Way, Dimitar Shterionov

Research output: Contribution to journalConference articlepeer-review

Abstract

We present our system for the CLIN29 shared task on cross-genre gender detection for Dutch. We experimented with a multitude of neural models (CNN, RNN, LSTM, etc.), more “traditional” models (SVM, RF, LogReg, etc.), different feature sets as well as data pre-processing. The final results suggested that using tokenized, non-lowercased data works best for most of the neural models, while a combination of word clusters, character trigrams and word lists showed to be most beneficial for the majority of the more “traditional” (that is, non-neural) models, beating features used in previous tasks such as ngrams, character n-grams, part-of-speech tags and combinations thereof. In contradiction with the results described in previous comparable shared tasks, our neural models performed better than our best traditional approaches with our best feature set-up. Our final model consisted of a weighted ensemble model combining the top 25 models. Our final model won both the in-domain gender prediction task and the cross-genre challenge, achieving an average accuracy of 64.93% on the in-domain gender prediction task, and 56.26% on cross-genre gender prediction.

Original languageEnglish
Pages (from-to)53-61
Number of pages9
JournalCEUR Workshop Proceedings
Volume2453
StatePublished - 2019
Event2019 Shared Task on Cross-Genre Gender Prediction in Dutch at CLIN29, GxG-CLIN29 2019 - Groningen, Netherlands
Duration: 31 Jan 2019 → …

Bibliographical note

Publisher Copyright:
© 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)

Funding

This work has been supported by Dublin City University Faculty of Engineering & Computing under the Daniel O’Hare Research Scholarship scheme and by the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106) and Theo Hoffenberg, founder & CEO of Reverso. We would also like to thank the organizers of the shared task.

FundersFunder number
ADAPT Centre for Digital Content Technology
Science Foundation Ireland13/RC/2106
Dublin City University

    Fingerprint

    Dive into the research topics of 'ABI neural ensemble model for gender prediction adapt Bar-ilan submission for the Clin29 shared task on gender prediction'. Together they form a unique fingerprint.

    Cite this