Author profiling: Gender prediction from Tweets and images: Notebook for PAN at CLEF 2018

Yaakov HaCohen-Kerner, Yair Yigal, Elyashiv Shayovitz, Daniel Miller, Toby Breckon

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations


Author profiling deals with identification of various details about the author of the text (e.g., age, cultural background, gender, native language, personality). In this paper, we describe the participation of our teams (yigall8 and millerl8, both teams contain the same people, but in another order) in the PAN 2018 shared task on author profiling, identifying authors' gender where for each author, 100 tweets and 10 images are provided. The authors were grouped by the language of their tweets: English, Spanish, and Arabic. In this paper, we describe our pre-processing, feature sets, machine learning methods and accuracy results. The best results using the textual features were achieved using the MLP method after applying the L normalization and using 9, 000 word unigrams for English, 10, 000 word unigrams and one orthographic feature for Spanish, and 7, 000 word unigrams and one orthographic feature for Arabic. We also tried various additional feature sets, including style-based feature sets. In most of the cases, these features did not improve the results and in a few cases even hurt the results. The best result (61.54%) for the visual features was obtained by the LR method using all the features (SIFT & Color & VGG) and the best basic feature set is the VGG. The best result for the combined features was achieved using modeL2 (millerl8) with 0.75 as a weight to the best textual model and a weight of 0.25 for NN Classifier (Keras) using only the 1000 VGG features.

Original languageEnglish
JournalCEUR Workshop Proceedings
StatePublished - 2018
Externally publishedYes
Event19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 - Avignon, France
Duration: 10 Sep 201814 Sep 2018

Bibliographical note

Funding Information:
Acknowledgments. This work was partially funded by the Jerusalem College of Technology (Lev Academic Center) and we gratefully acknowledge its support.


  • Author Profiling
  • Content-based Features
  • Gender Classification
  • Images
  • Style-based Features
  • Supervised Machine Learning
  • Tweets
  • Visual Features


Dive into the research topics of 'Author profiling: Gender prediction from Tweets and images: Notebook for PAN at CLEF 2018'. Together they form a unique fingerprint.

Cite this