OtoBERT: Identifying Suffixed Verbal Forms in Modern Hebrew Literature

Avi Shmidman, Shaltiel Shmidman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We provide a solution for a specific morphological obstacle which often makes Hebrew literature difficult to parse for the younger generation. The morphologically-rich nature of the Hebrew language allows pronominal direct objects to be realized as bound morphemes, suffixed to the verb. Although such suffixes are often utilized in Biblical Hebrew, their use has all but disappeared in modern Hebrew. Nevertheless, authors of modern Hebrew literature, in their search for literary flair, do make use of such forms. These unusual forms are notorious for alienating young readers from Hebrew literature, especially because these rare suffixed forms are often orthographically identical to common Hebrew words with different meanings. Upon encountering such words, readers naturally select the usual analysis of the word; yet, upon completing the sentence, they find themselves confounded. Young readers end up feeling "tricked", and this in turn contributes to their alienation from the text. In order to address this challenge, we pretrained a new BERT model specifically geared to identify such forms, so that they may be automatically simplified and/or flagged. We release this new BERT model to the public for unrestricted use.

Original languageEnglish
Title of host publicationTSAR 2024 - 3rd Workshop on Text Simplification, Accessibility and Readability, Proceedings of the Workshop
EditorsMatthew Shardlow, Horacio Saggion, Fernando Alva-Manchego, Marcos Zampieri, Kai North, Sanja Stajner, Regina Stodden
PublisherAssociation for Computational Linguistics (ACL)
Pages12-19
Number of pages8
ISBN (Electronic)9798891761766
DOIs
StatePublished - 2024
Event3rd Workshop on Text Simplification, Accessibility and Readability, TSAR 2024 - Miami, United States
Duration: 15 Nov 2024 → …

Publication series

NameTSAR 2024 - 3rd Workshop on Text Simplification, Accessibility and Readability, Proceedings of the Workshop

Conference

Conference3rd Workshop on Text Simplification, Accessibility and Readability, TSAR 2024
Country/TerritoryUnited States
CityMiami
Period15/11/24 → …

Bibliographical note

Publisher Copyright:
© 2024 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'OtoBERT: Identifying Suffixed Verbal Forms in Modern Hebrew Literature'. Together they form a unique fingerprint.

Cite this