TY - JOUR
T1 - Automatic classification of complaint letters according to service provider categories
AU - HaCohen-Kerner, Yaakov
AU - Dilmon, Rakefet
AU - Hone, Maor
AU - Ben-Basan, Matanya Aharon
N1 - Publisher Copyright:
© 2019 Elsevier Ltd
PY - 2019/11
Y1 - 2019/11
N2 - In the technological age, the phenomenon of complaint letters published on the Internet is increasing. Therefore, it is important to automatically classify complaint letters according to various criteria, such as company categories. In this research, we investigated the automatic text classification of complaint letters written in Hebrew that were sent to various companies from a wide variety of categories. The classification was performed according to company categories such as insurance, cellular communication, and rental cars. We conducted an extensive set of classification experiments of complaint letters to seven/six/five/four company categories. The classification experiments were performed using various sets of word unigrams, four machine learning methods, two feature filtering methods, and parameter tuning. The classification results are relatively high for all six measures: accuracy, precision, recall, F1, PRC-area, and ROC-area. The best accuracy results for seven, six, five, and four categories are 84.5%, 88.4%, 91.4%, and 93.8%, respectively. An analysis of the most frequently occurring words in the complaints about almost all categories revealed that the most significant issues were related to poor service and delayed delivery. An interesting result shows that only in the domain of hospitals was the subject of the domain itself (i.e., the patient, the medical treatment, the place of the treatment, and the medical staff) the most important issue. Another interesting finding is that the issue of “price” was of little or no importance to the complainants. These findings suggest that in their preoccupation with their bottom line of profitability, many service providers are blind to how paramount good service and timely delivery (and, in the case of hospitals, the domain itself) are to their clientele.
AB - In the technological age, the phenomenon of complaint letters published on the Internet is increasing. Therefore, it is important to automatically classify complaint letters according to various criteria, such as company categories. In this research, we investigated the automatic text classification of complaint letters written in Hebrew that were sent to various companies from a wide variety of categories. The classification was performed according to company categories such as insurance, cellular communication, and rental cars. We conducted an extensive set of classification experiments of complaint letters to seven/six/five/four company categories. The classification experiments were performed using various sets of word unigrams, four machine learning methods, two feature filtering methods, and parameter tuning. The classification results are relatively high for all six measures: accuracy, precision, recall, F1, PRC-area, and ROC-area. The best accuracy results for seven, six, five, and four categories are 84.5%, 88.4%, 91.4%, and 93.8%, respectively. An analysis of the most frequently occurring words in the complaints about almost all categories revealed that the most significant issues were related to poor service and delayed delivery. An interesting result shows that only in the domain of hospitals was the subject of the domain itself (i.e., the patient, the medical treatment, the place of the treatment, and the medical staff) the most important issue. Another interesting finding is that the issue of “price” was of little or no importance to the complainants. These findings suggest that in their preoccupation with their bottom line of profitability, many service providers are blind to how paramount good service and timely delivery (and, in the case of hospitals, the domain itself) are to their clientele.
KW - Bag of words
KW - Complaint letters
KW - Semantic fields
KW - Service providers
KW - Supervised machine learning
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85071402993&partnerID=8YFLogxK
U2 - 10.1016/j.ipm.2019.102102
DO - 10.1016/j.ipm.2019.102102
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85071402993
SN - 0306-4573
VL - 56
JO - Information Processing and Management
JF - Information Processing and Management
IS - 6
M1 - 102102
ER -