Abstract
Code Mixing (CM) is an important in the area of Natural Language Processing (NLP) but it is more challenging technique. There are many techniques available for code-mixing but till now less work has been done for code mixing. In this paper we discussed the various approaches used for code mixing and classifying existing code mixing algorithm according to their techniques. Most of people do not always use the Unicode that means only one language during chatting on Facebook, Gmail, Twitter, etc. If some people do not understand the Hindi language, then it is very difficult task for these people to understanding the meaning of code-mixedsentences. For correct Hindi words we used the converter form Hindi words to English words. But most of the words are not correct words according to dictionary and also the code-mixed sentences contained the short form, abbreviation words, phonetic typing, etc. So we have used the character N-gram pruning which is one of the most popular and successful technique of Natural Language Processing (NLP) with dictionary based approaches for language identification of social media text. This paper proposed a scheme which improve the translation by removing the phonetic typing, abbreviation words, shortcut, Hindi word and emotions.
Original language | English |
---|---|
Title of host publication | Advanced Informatics for Computing Research - 1st International Conference, ICAICR 2017, Revised Selected Papers |
Editors | Ashish Kumar Luhach, Balasubramanian Raman, Dharm Singh, Pawan Lingras |
Publisher | Springer Verlag |
Pages | 121-129 |
Number of pages | 9 |
ISBN (Print) | 9789811057793 |
DOIs | |
State | Published - 2017 |
Externally published | Yes |
Event | 1st International Conference on Advanced Informatics for Computing Research, ICAICR 2017 - Jalandhar, India Duration: 17 Mar 2017 → 18 Mar 2017 |
Publication series
Name | Communications in Computer and Information Science |
---|---|
Volume | 712 |
ISSN (Print) | 1865-0929 |
Conference
Conference | 1st International Conference on Advanced Informatics for Computing Research, ICAICR 2017 |
---|---|
Country/Territory | India |
City | Jalandhar |
Period | 17/03/17 → 18/03/17 |
Bibliographical note
Publisher Copyright:© 2017, Springer Nature Singapore Pte Ltd.
Keywords
- Abbreviation (A)
- Code mixing (CM)
- Code switching (CS)
- Contracted (C)
- Creative typing (CT)
- Language identification (LID)
- Natural language processing (NLP)