Abstract
In this paper, we describe our submissions for the HASOC 2021 contest. We tackled subtask 1A that addresses the problem of hate speech and offensive language identification in three languages: English, Hindi, and Marathi. We developed different models using six classical supervised machine learning methods: support vector classifier, binary support vector classifier, random forest, ada-boost classifier, multi-layer perceptron, and logistic regression. Our best submission was a model we built for offensive language identification in Marathi using random forest. This model was ranked in 6th place out of 25 teams. Our result is lower by only 0.0059 than the result of the team that was ranked in 3rd place. Our ML models were applied on various combinations of character and/or word n-gram features from uni-gram to 8-gram.
Original language | English |
---|---|
Pages (from-to) | 501-507 |
Number of pages | 7 |
Journal | CEUR Workshop Proceedings |
Volume | 3159 |
State | Published - 2021 |
Event | Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - Gandhinagar, India Duration: 13 Dec 2021 → 17 Dec 2021 |
Bibliographical note
Publisher Copyright:© 2021 Copyright for this paper by the Forum for Information Retrieval Evaluation, December 13-17, 2021, India.
Keywords
- Hate Speech
- offensive language
- supervised machine learning
- word/char n-grams