The Truth, The Whole Truth, and Nothing but the Truth: A New Benchmark Dataset for Hebrew Text Credibility Assessment

Ben Hagag, Reut Tsarfaty

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the age of information overload, it is more important than ever to discern fact from fiction. From the internet to traditional media, we are constantly confronted with a deluge of information, much of which comes from politicians and other public figures who wield significant influence. In this paper, we introduce HeTrue: a new, publicly available dataset for evaluating the credibility of statements made by Israeli public figures and politicians. This dataset consists of 1021 statements, manually annotated by Israeli professional journalists, for their credibility status. Using this corpus, we set out to assess whether the credibility of statements can be predicted based on the text alone. To establish a baseline, we compare text-only methods with others using additional data like metadata, context, and evidence. Furthermore, we develop several credibility assessment models, including a feature-based model that utilizes linguistic features, and state-of-the-art transformer-based models with contextualized embeddings from a pre-trained encoder. Empirical results demonstrate improved performance when models integrate statement and context, outperforming those relying on the statement text alone. Our best model, which also integrates evidence, achieves a 48.3 F1 Score, suggesting that HeTrue is a challenging benchmark, calling for further work on this task.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2023
PublisherAssociation for Computational Linguistics (ACL)
Pages3850-3865
Number of pages16
ISBN (Electronic)9798891760615
StatePublished - 2023
Event2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Singapore, Singapore
Duration: 6 Dec 202310 Dec 2023

Publication series

NameFindings of the Association for Computational Linguistics: EMNLP 2023

Conference

Conference2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Country/TerritorySingapore
CitySingapore
Period6/12/2310/12/23

Bibliographical note

Publisher Copyright:
© 2023 Association for Computational Linguistics.

Funding

We would like to thank "The Whistle" at "Globes" for their essential support in creating the HeTrue dataset, serving as a foundational element of our paper and facilitating future research in this domain. This research was funded by the Israeli Ministry of Science and Technology (MOST) grant No. 3-17992, and an Israeli Innovation Authority grant (IIA) KAMIN grant, for which we are grateful. In addition, This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme, grant agreement No. 677352. We would like to thank "The Whistle" at "Globes" for their essential support in creating the HeTrue dataset, serving as a foundational element of our paper and facilitating future research in this domain. This research was funded by the Israeli Ministry of Science and Technology (MOST) grant No. 3-17992, and an Israeli Innovation Authority grant (IIA) KAMIN grant, for which we are grateful. In addition, This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, grant agreement No. 677352.

FundersFunder number
Horizon 2020 Framework Programme
Institute of Internal Auditors
European Research Council
Ministry of Science, Technology and Space3-17992
Ministry of science and technology, Israel
Horizon 2020677352
Israel Innovation Authority

    Fingerprint

    Dive into the research topics of 'The Truth, The Whole Truth, and Nothing but the Truth: A New Benchmark Dataset for Hebrew Text Credibility Assessment'. Together they form a unique fingerprint.

    Cite this