Abstract
The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with the NLG method further improves the results.
Original language | English |
---|---|
Title of host publication | EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1491-1503 |
Number of pages | 13 |
ISBN (Electronic) | 9781954085022 |
DOIs | |
State | Published - 2021 |
Event | 16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 - Virtual, Online Duration: 19 Apr 2021 → 23 Apr 2021 |
Publication series
Name | EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference |
---|
Conference
Conference | 16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 |
---|---|
City | Virtual, Online |
Period | 19/04/21 → 23/04/21 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEX-TRACT). This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEXTRACT).
Funders | Funder number |
---|---|
Horizon 2020 Framework Programme | |
European Commission | |
Horizon 2020 | 802774 |