Abstract
This paper analyzes what linguistic features differentiate true and false stories written in Hebrew. To do so, we have defined four feature sets containing 145 features: POS-Tags, quantitative, repetition, and special expressions. The examined corpus contains stories that were composed by 48 native Hebrew speakers who were asked to tell both false and true stories. Classification experiments on all possible combinations of these four feature sets using five supervised machine learning methods have been applied. The Part of Speech (POS) set was superior to all others and has been found as a key component. The best accuracy result (89.6%) has been achieved by a combination of sixteen POS-Tags and one quantitative feature.
Original language | English |
---|---|
Pages | 176-186 |
Number of pages | 11 |
State | Published - 2015 |
Event | 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015 - Shanghai, China Duration: 30 Oct 2015 → 1 Nov 2015 |
Conference
Conference | 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015 |
---|---|
Country/Territory | China |
City | Shanghai |
Period | 30/10/15 → 1/11/15 |