HeGeL: A Novel Dataset for Geo-Location from Hebrew Text

Tzuf Paz-Argaman, Tal Bauman, Itai Mondshine, Itzhak Omer, Sagi Dalyot, Reut Tsarfaty

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning. Even though there are quite a few datasets in English used for geolocation, they are currently based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, such that the location retrieval resolution is limited. Furthermore, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics, ACL 2023
PublisherAssociation for Computational Linguistics (ACL)
Pages7311-7321
Number of pages11
ISBN (Electronic)9781959429623
StatePublished - 2023
Event61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: 9 Jul 202314 Jul 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period9/07/2314/07/23

Bibliographical note

Publisher Copyright:
© 2023 Association for Computational Linguistics.

Funding

This research is funded by a grant from the European Research Council, ERC-StG grant number 677352, and a grant by the Israeli Ministry of Science and Technology (MOST), grant number 3-17992, for which we are grateful.

FundersFunder number
ERC-STG677352
European Research Council
Ministry of Science, Technology and Space3-17992
Ministry of science and technology, Israel

    Fingerprint

    Dive into the research topics of 'HeGeL: A Novel Dataset for Geo-Location from Hebrew Text'. Together they form a unique fingerprint.

    Cite this