Abstract
We created a dataset of syntactic-ngrams (counted dependency-tree fragments) based on a corpus of 3.5 million English books. The dataset includes over 10 billion distinct items covering a wide range of syntactic configurations. It also includes temporal information, facilitating new kinds of research into lexical semantics over time. This paper describes the dataset, the syntactic representation, and the kinds of information provided.
Original language | English |
---|---|
Title of host publication | *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 241-247 |
Number of pages | 7 |
ISBN (Electronic) | 9781937284480 |
State | Published - 2013 |
Event | 2nd Joint Conference on Lexical and Computational Semantics, *SEM 2013 - Atlanta, United States Duration: 13 Jun 2013 → 14 Jun 2013 |
Publication series
Name | *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics |
---|---|
Volume | 1 |
Conference
Conference | 2nd Joint Conference on Lexical and Computational Semantics, *SEM 2013 |
---|---|
Country/Territory | United States |
City | Atlanta |
Period | 13/06/13 → 14/06/13 |
Bibliographical note
Publisher Copyright:c 2013 Association for Computational Linguistics