Abstract
This paper analyzes what linguistic features differentiate true and false stories written in Hebrew. To do so, we have defined four feature sets containing 145 features: POS-Tags, quantitative, repetition, and special expressions. The examined corpus contains stories that were composed by 48 native Hebrew speakers who were asked to tell both false and true stories. Classification experiments on all possible combinations of these four feature sets using five supervised machine learning methods have been applied. The Part of Speech (POS) set was superior to all others and has been found as a key component. The best accuracy result (89.6%) has been achieved by a combination of sixteen POS-Tags and one quantitative feature.
Original language | English |
---|---|
Title of host publication | Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: |
Subtitle of host publication | Posters |
Editors | Hai Zhao |
Place of Publication | Shanghai |
Publisher | Pacific Asia Conference on Language,Information and Computation |
Pages | 176-186 |
Number of pages | 11 |
State | Published - 2015 |