Abstract
Multilingual models aim for language-invariant representations but still prominently encode language identity. This, along with the scarcity of high-quality parallel retrieval data, limits their performance in retrieval. We introduce LANCER, a multi-task learning framework that improves language-invariant dense retrieval by reducing language-specific signals in the embedding space. Leveraging the notion of linear concept erasure, we design a loss function that penalizes cross-correlation between representations and their language labels. LANCER leverages only English retrieval data and general multilingual corpora, training models to focus on language-invariant retrieval by semantic similarity without necessitating a vast parallel corpus. Experimental results on various datasets show our method consistently improves over baselines, with extensive analyses demonstrating greater language agnosticism.
| Original language | English |
|---|---|
| Title of host publication | EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference |
| Editors | Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 13261-13273 |
| Number of pages | 13 |
| ISBN (Electronic) | 9798891761643 |
| DOIs | |
| State | Published - 2024 |
| Event | 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, United States Duration: 12 Nov 2024 → 16 Nov 2024 |
Publication series
| Name | EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference |
|---|
Conference
| Conference | 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 |
|---|---|
| Country/Territory | United States |
| City | Hybrid, Miami |
| Period | 12/11/24 → 16/11/24 |
Bibliographical note
Publisher Copyright:© 2024 Association for Computational Linguistics.
Fingerprint
Dive into the research topics of 'Language Concept Erasure for Language-invariant Dense Retrieval'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver