Universal dependencies v1: A multilingual treebank collection

Joakim Nivre, Marie Catherine De Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

863 Scopus citations

Abstract

Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments. It is also useful for multilingual system development and comparative linguistic studies. Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. In this paper, we describe v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages.

Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditorsNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
PublisherEuropean Language Resources Association (ELRA)
Pages1659-1666
Number of pages8
ISBN (Electronic)9782951740891
StatePublished - 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: 23 May 201628 May 2016

Publication series

NameProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

Conference

Conference10th International Conference on Language Resources and Evaluation, LREC 2016
Country/TerritorySlovenia
CityPortoroz
Period23/05/1628/05/16

Bibliographical note

Funding Information:
We thank all contributors working on annotated corpora under the UD guidelines, to whom we owe a substantial part of the success and momentum achieved within the UD project so far: Zˇeljko Agić, Riyaz Ahmad, Maria Jesus Aranzabe, Masayuki Asahara, Aitziber Atutxa, Miguel Ballesteros, Cristina Bosco, Giuseppe G. A. Celano, Jinho Choi, C¸ag˘rı C¸öltekin, Kaja Dobrovoljc, Timothy Dozat, Binyam Ephrem, Tomazˇ Erjavec, Richárd Farkas, Jen-nifer Foster, Iakes Goenaga, Koldo Gojenola, Bruno Guillaume, Nizar Habash, Dag Haug, Hiroshi Kanayama, Jenna Kanerva, Simon Krek, Juha Kuokkala, Veronika Laippala, Alessandro Lenci, Krister Lindén, Nikola Ljubesˇić, Olga Lyashevskaya, Teresa Lynn, Aibek Makazhanov, Catalina Ma˘ra˘nduc, Héctor Martínez Alonso, Anna Missilä, Simon-etta Montemagni, Verginica Mititelu, Yusuke Miyao, Shin-suke Mori, Hanna Nurmi, Petya Osenova, Lilja Øvrelid, Petr Pajas, Elena Pascual, Marco Passarotti, Jussi Piitu-lainen, Barbara Plank, Prokopis Prokopidis, Loganathan Ramasamy, Sebastian Schuster, Wolfgang Seeker, Moj-gan Seraji, Maria Simi, Kiril Simov, Aaron Smith, Jan Sˇteˇpánek, Alane Suhr, Takaaki Tanaka, Anders Trærup Jo-hannsen, Francis Tyers, Sumire Uematsu, Veronika Vincze, Rob Voigt, and Jonathan Washington. The work has been partially funded by the Czech Science Foundation grant GA15-10472S, Czech MEYS grant LM2015071, and SWE-CLARIN.

Keywords

  • Annotation
  • Cross-linguistic
  • Dependency
  • Multilingual
  • Treebanks
  • Universal

Fingerprint

Dive into the research topics of 'Universal dependencies v1: A multilingual treebank collection'. Together they form a unique fingerprint.

Cite this