TY - GEN
T1 - Unsupervised Distillation of Syntactic Information from Contextualized Word Representations.
AU - Ravfogel, Shauli
AU - Elazar, Yanai
AU - Goldberger, Jacob
AU - Goldberg, Yoav
N1 - DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
PY - 2020
Y1 - 2020
N2 - Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic task. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in a few-shot parsing setting.
AB - Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic task. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in a few-shot parsing setting.
UR - https://aclanthology.org/2020.blackboxnlp-1.9/
U2 - 10.18653/v1/2020.blackboxnlp-1.9
DO - 10.18653/v1/2020.blackboxnlp-1.9
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
SP - 91
EP - 106
BT - Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
PB - Association for Computational Linguistics
ER -