Boosting SimRank with Semantics

Tova Milo, Amit Somech, Brit Youngmann

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

The problem of estimating the similarity of a pair of nodes in an information network draws extensive interest in numerous fields, e.g., social networks and recommender systems. In this work we revisit SimRank, a popular and well studied similarity measure for information networks, that quantifies the similarity of two nodes based on the similarity of their neighbors. SimRank’s popularity stems from its simple, declarative definition and its efficient, scalable computation. However, despite its wide adaptation, it has been observed that for many applications SimRank may yield inaccurate similarity estimations, due to the fact that it focuses on the network structure and ignores the semantics conveyed in the node/edge labels. Therefore, the question that we ask is can SimRank be enriched with semantics while preserving its advantages? We answer the question positively and present SemSim, a modular variant of SimRank that allows to inject into the computation any semantic similarly measure, which satisfies three natural conditions. The probabilistic framework that we develop for SemSim is anchored in a careful modification of SimRank’s underlying random surfer model. It employs Importance Sampling along with a novel pruning technique, based on unique properties of SemSim. Our framework yields execution times essentially on par with the (semantic-less) SimRank, while maintaining negligible error rate, and facilitates direct adaptation of existing SimRank optimizations. Our experiments demonstrate the robustness of SemSim, even compared to task-dedicated measures.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2019
Subtitle of host publication22nd International Conference on Extending Database Technology, Proceedings
EditorsMelanie Herschel, Carsten Binnig, Berthold Reinwald, Zoi Kaoudi, Helena Galhardas, Irini Fundulaki
PublisherOpenProceedings.org
Pages37-48
Number of pages12
ISBN (Electronic)9783893180813
DOIs
StatePublished - 2019
Externally publishedYes
Event22nd International Conference on Extending Database Technology, EDBT 2019 - Lisbon, Portugal
Duration: 26 Mar 201929 Mar 2019

Publication series

NameAdvances in Database Technology - EDBT
Volume2019-March
ISSN (Electronic)2367-2005

Conference

Conference22nd International Conference on Extending Database Technology, EDBT 2019
Country/TerritoryPortugal
CityLisbon
Period26/03/1929/03/19

Bibliographical note

Publisher Copyright:
© 2019 Copyright held by the owner/author(s).

Fingerprint

Dive into the research topics of 'Boosting SimRank with Semantics'. Together they form a unique fingerprint.

Cite this