Bilingual code-mixing in indian social media texts for Hindi and English

Rajesh Kumar, Pardeep Singh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Code Mixing (CM) is an important in the area of Natural Language Processing (NLP) but it is more challenging technique. There are many techniques available for code-mixing but till now less work has been done for code mixing. In this paper we discussed the various approaches used for code mixing and classifying existing code mixing algorithm according to their techniques. Most of people do not always use the Unicode that means only one language during chatting on Facebook, Gmail, Twitter, etc. If some people do not understand the Hindi language, then it is very difficult task for these people to understanding the meaning of code-mixedsentences. For correct Hindi words we used the converter form Hindi words to English words. But most of the words are not correct words according to dictionary and also the code-mixed sentences contained the short form, abbreviation words, phonetic typing, etc. So we have used the character N-gram pruning which is one of the most popular and successful technique of Natural Language Processing (NLP) with dictionary based approaches for language identification of social media text. This paper proposed a scheme which improve the translation by removing the phonetic typing, abbreviation words, shortcut, Hindi word and emotions.

Original languageEnglish
Title of host publicationAdvanced Informatics for Computing Research - 1st International Conference, ICAICR 2017, Revised Selected Papers
EditorsAshish Kumar Luhach, Balasubramanian Raman, Dharm Singh, Pawan Lingras
PublisherSpringer Verlag
Pages121-129
Number of pages9
ISBN (Print)9789811057793
DOIs
StatePublished - 2017
Externally publishedYes
Event1st International Conference on Advanced Informatics for Computing Research, ICAICR 2017 - Jalandhar, India
Duration: 17 Mar 201718 Mar 2017

Publication series

NameCommunications in Computer and Information Science
Volume712
ISSN (Print)1865-0929

Conference

Conference1st International Conference on Advanced Informatics for Computing Research, ICAICR 2017
Country/TerritoryIndia
CityJalandhar
Period17/03/1718/03/17

Bibliographical note

Publisher Copyright:
© 2017, Springer Nature Singapore Pte Ltd.

Keywords

  • Abbreviation (A)
  • Code mixing (CM)
  • Code switching (CS)
  • Contracted (C)
  • Creative typing (CT)
  • Language identification (LID)
  • Natural language processing (NLP)

Fingerprint

Dive into the research topics of 'Bilingual code-mixing in indian social media texts for Hindi and English'. Together they form a unique fingerprint.

Cite this