TY - JOUR
T1 - Detection of Anorexic Girls-In Blog Posts Written in Hebrew Using a Combined Heuristic AI and NLP Method
AU - Hacohen-Kerner, Yaakov
AU - Manor, Natan
AU - Goldmeier, Michael
AU - Bachar, Eytan
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2022
Y1 - 2022
N2 - In this study, we aim to detect in social media texts written in Hebrew girls who are suspected of being anorexic. We constructed a dataset containing 100 blog posts written by females who are probably anorexic, and 100 blog posts written by females who are likely to be non-anorexic. The construction of this dataset was supervised and approved by an international expert on anorexia. We tested several text classification (TC) methods, using various feature sets (content-based and style-based), five machine learning (ML) methods, three RNN models, four BERT models, three basic preprocessing methods, three feature filtering methods, and parameter tuning. Several insights were found as follows. A set of 50-word n-grams (mostly word unigrams) given by an expert was found as a good basic detector. A heuristic process based on the random forest ML method has overcome a combinatorial explosion and led to significant improvement over a baseline result at a level of text{P},{=}.01. Application of an iterative process that tests combinations of 'k out of text{n}' ' where text{n}',{ < } n (n is the number of feature sets) lead to a result of 90.63%, using a combination of 300 features from ten feature sets.
AB - In this study, we aim to detect in social media texts written in Hebrew girls who are suspected of being anorexic. We constructed a dataset containing 100 blog posts written by females who are probably anorexic, and 100 blog posts written by females who are likely to be non-anorexic. The construction of this dataset was supervised and approved by an international expert on anorexia. We tested several text classification (TC) methods, using various feature sets (content-based and style-based), five machine learning (ML) methods, three RNN models, four BERT models, three basic preprocessing methods, three feature filtering methods, and parameter tuning. Several insights were found as follows. A set of 50-word n-grams (mostly word unigrams) given by an expert was found as a good basic detector. A heuristic process based on the random forest ML method has overcome a combinatorial explosion and led to significant improvement over a baseline result at a level of text{P},{=}.01. Application of an iterative process that tests combinations of 'k out of text{n}' ' where text{n}',{ < } n (n is the number of feature sets) lead to a result of 90.63%, using a combination of 300 features from ten feature sets.
KW - Mental disorders
KW - natural language processing
KW - supervised machine learning
KW - text analysis
KW - text classification
KW - text processing
UR - http://www.scopus.com/inward/record.url?scp=85127467191&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2022.3162685
DO - 10.1109/ACCESS.2022.3162685
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85127467191
SN - 2169-3536
VL - 10
SP - 34800
EP - 34814
JO - IEEE Access
JF - IEEE Access
ER -