TY - JOUR
T1 - Unsupervised and supervised exploitation of semantic domains in lexical disambiguation
AU - Gliozzo, Alfio
AU - Strapparava, Carlo
AU - Dagan, Ido
PY - 2004/7
Y1 - 2004/7
N2 - Domains are common areas of human discussion, such as economics, politics, law, science, etc., which are at the basis of lexical coherence. This paper explores the dual role of domains in word sense disambiguation (WSD). On one hand, domain information provides generalized features at the paradigmatic level that are useful to discriminate among word senses. On the other hand, domain distinctions constitute a useful level of coarse grained sense distinctions, which lends itself to more accurate disambiguation with lower amounts of knowledge. In this paper we extend and ground the modeling of domains and the exploitation of WORDNET DOMAINS, an extension of WORDNET in which each synset is labeled with domain information. We propose a novel unsupervised probabilistic method for the critical step of estimating domain relevance for contexts, and suggest utilizing it within unsupervised domain driven disambiguation for word senses, as well as within a traditional supervised approach. The paper presents empirical assessments of the potential utilization of domains in WSD at a wide range of comparative settings, supervised and unsupervised. Following the dual role of domains we report experiments that evaluate both the extent to which domain information provides effective features for WSD, as well as the accuracy obtained by WSD at domain-level sense granularity. Furthermore, we demonstrate the potential for either avoiding or minimizing manual annotation thanks to the generalized level of information provided by domains.
AB - Domains are common areas of human discussion, such as economics, politics, law, science, etc., which are at the basis of lexical coherence. This paper explores the dual role of domains in word sense disambiguation (WSD). On one hand, domain information provides generalized features at the paradigmatic level that are useful to discriminate among word senses. On the other hand, domain distinctions constitute a useful level of coarse grained sense distinctions, which lends itself to more accurate disambiguation with lower amounts of knowledge. In this paper we extend and ground the modeling of domains and the exploitation of WORDNET DOMAINS, an extension of WORDNET in which each synset is labeled with domain information. We propose a novel unsupervised probabilistic method for the critical step of estimating domain relevance for contexts, and suggest utilizing it within unsupervised domain driven disambiguation for word senses, as well as within a traditional supervised approach. The paper presents empirical assessments of the potential utilization of domains in WSD at a wide range of comparative settings, supervised and unsupervised. Following the dual role of domains we report experiments that evaluate both the extent to which domain information provides effective features for WSD, as well as the accuracy obtained by WSD at domain-level sense granularity. Furthermore, we demonstrate the potential for either avoiding or minimizing manual annotation thanks to the generalized level of information provided by domains.
UR - http://www.scopus.com/inward/record.url?scp=3142674451&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2004.05.006
DO - 10.1016/j.csl.2004.05.006
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:3142674451
SN - 0885-2308
VL - 18
SP - 275
EP - 299
JO - Computer Speech and Language
JF - Computer Speech and Language
IS - 3 SPEC. ISS.
ER -