Skip to main navigation Skip to search Skip to main content

Bots and gender profiling of tweets using word and character N-grams notebook for PAN at CLEF 2019

  • Yaakov HaCohen-Kerner
  • , Natan Manor
  • , Michael Goldmeier

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Author profiling deals with the identification of various details about the author of the text (e.g., age and gender). In this paper, we describe the participation of our team (hacohenkerner19) in the PAN 2019 shared task on Bots and Gender Profiling in two languages: English and Spanish. Given a Twitter feed, we should determine whether its author is a bot or a human. In the case of human, we should identify her/his gender. In this paper, we describe our preprocessing methods, feature sets, five applied machine learning methods, and accuracy results. The best accuracy result for the English dataset (84.8%) was obtained by LinearSVC using 2,000 word unigrams. The same result (84.8%) was also obtained by LR by using four preprocessing methods, 2,000 word unigrams, and 1,000 word bigrams with maximal skips of 2 words. The best accuracy result (75,54%) for the Spanish dataset was achieved using LinearSVC with only the HTML tag removal preprocessing method and a combination of 1,000 word unigrams, 1,000 word bigrams, and 3,000 character trigrams.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume2380
StatePublished - 2019
Externally publishedYes
Event20th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2019 - Lugano, Switzerland
Duration: 9 Sep 201912 Sep 2019

Bibliographical note

Publisher Copyright:
© 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland.

Funding

Acknowledgments. This work was partially funded by the Jerusalem College Technology (Lev Academic Center) and we gratefully acknowledge its support. This work was partially funded by the Jerusalem College of Technology (Lev Academic Center) and we gratefully acknowledge its support.

Funders
Jerusalem College Technology
Jerusalem College of Technology

    Keywords

    • Bot Profiling
    • Character N-grams
    • Gender Profiling
    • Supervised Machine Learning
    • Word N-grams

    Fingerprint

    Dive into the research topics of 'Bots and gender profiling of tweets using word and character N-grams notebook for PAN at CLEF 2019'. Together they form a unique fingerprint.

    Cite this