Interactive extractive search over biomedical corpora

Hillel Taub-Tabib, Micah Shlain, Shoval Sadde, Dan Lahav, Matan Eyal, Yaara Cohen, Yoav Goldberg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

We present a system that allows life-science researchers to search a linguistically annotated corpus of scientific texts using patterns over dependency graphs, as well as using patterns over token sequences and a powerful variant of boolean keyword queries. In contrast to previous attempts to dependency-based search, we introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to efficient linguistic graphindexing and retrieval engine. This allows for rapid exploration, development and refinement of user queries. We demonstrate the system using example workflows over two corpora: the PubMed corpus including 14,446,243 PubMed abstracts and the CORD-19 dataset, a collection of over 45,000 research papers focused on COVID-19 research. The system is publicly available at https://allenai. github.io/spike

Original languageEnglish
Title of host publicationBioNLP 2020 - 19th SIGBioMed Workshop on Biomedical Language Processing, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages28-37
Number of pages10
ISBN (Electronic)9781952148095
StatePublished - 2020
Event19th SIGBioMed Workshop on Biomedical Language Processing, BioNLP 2020 at the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States
Duration: 9 Jul 2020 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference19th SIGBioMed Workshop on Biomedical Language Processing, BioNLP 2020 at the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
Country/TerritoryUnited States
CityVirtual, Online
Period9/07/20 → …

Bibliographical note

Publisher Copyright:
© Association for Computation Linguistics.

Funding

The work performed at BIU is supported by funding from the Europoean Research Council (ERC) under the Europoean Union's Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEXTRACT). Acknowledgements The work performed at BIU is supported by funding from the Europoean Research Council (ERC) under the Europoean Union’s Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEX-TRACT).

FundersFunder number
Europoean Union's Horizon 2020 research and innovation programme
Europoean Union’s Horizon 2020 research and innovation programme
Horizon 2020 Framework Programme802774
European Commission

    Fingerprint

    Dive into the research topics of 'Interactive extractive search over biomedical corpora'. Together they form a unique fingerprint.

    Cite this