Abstract
Verbal Multi-Word Expressions (VMWEs) are very common in many languages. They include among other types the following types: Verb-Particle Constructions (VPC) (e.g. get around), Light-Verb Constructions (LVC) (e.g. make a decision), and idioms (ID) (e.g. break a leg). In this paper, we present a new dataset for supervised learning of VMWEs written in Yiddish. The dataset was manually collected and annotated from a web resource. It contains a set of positive examples for VMWEs and a set of non-VMWEs examples. While the dataset can be used for training supervised algorithms, the positive examples can be used as seeds in unsupervised bootstrapping algorithms. Moreover, we analyze the lexical properties of VMWEs written in Yiddish by classifying them to six categories: VPC, LVC, ID, Inherently Pronominal Verb (IPronV), Inherently Prepositional Verb (IPrepV), and other (OTH). The analysis suggests some interesting features of VMWEs for exploration. This dataset is a first step towards automatic identification of VMWEs written in Yiddish, which is important for natural language understanding, generation and translation systems.
Original language | English |
---|---|
Title of host publication | Natural Language Processing and Information Systems - 23rd International Conference on Applications of Natural Language to Information Systems, NLDB 2018, Proceedings |
Editors | Farid Meziane, Max Silberztein, Faten Atigui, Elena Kornyshova, Elisabeth Metais |
Publisher | Springer Verlag |
Pages | 205-216 |
Number of pages | 12 |
ISBN (Print) | 9783319919461 |
DOIs | |
State | Published - 2018 |
Externally published | Yes |
Event | 23rd International Conference on Natural Language and Information Systems, NLDB 2018 - Paris, France Duration: 13 Jun 2018 → 15 Jun 2018 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 10859 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 23rd International Conference on Natural Language and Information Systems, NLDB 2018 |
---|---|
Country/Territory | France |
City | Paris |
Period | 13/06/18 → 15/06/18 |
Bibliographical note
Publisher Copyright:© 2018, Springer International Publishing AG, part of Springer Nature.
Keywords
- Multi-Word Expression (MWE)
- Verbal Multi-Word Expression (VMWE)
- Yiddish