Abstract
The problem of estimating the similarity of a pair of nodes in an information network draws extensive interest in numerous fields, e.g., social networks and recommender systems. In this work we revisit SimRank, a popular and well studied similarity measure for information networks, that quantifies the similarity of two nodes based on the similarity of their neighbors. SimRank’s popularity stems from its simple, declarative definition and its efficient, scalable computation. However, despite its wide adaptation, it has been observed that for many applications SimRank may yield inaccurate similarity estimations, due to the fact that it focuses on the network structure and ignores the semantics conveyed in the node/edge labels. Therefore, the question that we ask is can SimRank be enriched with semantics while preserving its advantages? We answer the question positively and present SemSim, a modular variant of SimRank that allows to inject into the computation any semantic similarly measure, which satisfies three natural conditions. The probabilistic framework that we develop for SemSim is anchored in a careful modification of SimRank’s underlying random surfer model. It employs Importance Sampling along with a novel pruning technique, based on unique properties of SemSim. Our framework yields execution times essentially on par with the (semantic-less) SimRank, while maintaining negligible error rate, and facilitates direct adaptation of existing SimRank optimizations. Our experiments demonstrate the robustness of SemSim, even compared to task-dedicated measures.
Original language | English |
---|---|
Title of host publication | Advances in Database Technology - EDBT 2019 |
Subtitle of host publication | 22nd International Conference on Extending Database Technology, Proceedings |
Editors | Melanie Herschel, Carsten Binnig, Berthold Reinwald, Zoi Kaoudi, Helena Galhardas, Irini Fundulaki |
Publisher | OpenProceedings.org |
Pages | 37-48 |
Number of pages | 12 |
ISBN (Electronic) | 9783893180813 |
DOIs | |
State | Published - 2019 |
Externally published | Yes |
Event | 22nd International Conference on Extending Database Technology, EDBT 2019 - Lisbon, Portugal Duration: 26 Mar 2019 → 29 Mar 2019 |
Publication series
Name | Advances in Database Technology - EDBT |
---|---|
Volume | 2019-March |
ISSN (Electronic) | 2367-2005 |
Conference
Conference | 22nd International Conference on Extending Database Technology, EDBT 2019 |
---|---|
Country/Territory | Portugal |
City | Lisbon |
Period | 26/03/19 → 29/03/19 |
Bibliographical note
Publisher Copyright:© 2019 Copyright held by the owner/author(s).