REEF: Resolving Length Bias in Frequent Sequence Mining

Ariella Richardson, Gal A. Kaminka, S. Kraus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Classic support based approaches efficiently address frequent sequence mining. However, support based mining has been shown to suffer from a bias towards short sequences. In this paper, we propose a method to resolve this bias when mining the most frequent sequences. In order to resolve the length bias we define norm-frequency, based on the statistical zscore of support, and use it to replace support based frequency. Our approach mines the subsequences that are frequent relative to other subsequences of the same length. Unfortunately, naive use of norm-frequency hinders mining scalability. Using normfrequency breaks the anti-monotonic property of support, an important part in being able to prune large sets of candidate sequences. We describe a bound that enables pruning to provide scalability. Experimental results on textual and computer user input data establish that we manage to overcome the short sequence bias successfully, and to illustrate the production of meaningful sequences with our mining algorithm.
Original languageEnglish
Title of host publicationIMMM 2013, The Third International Conference on Advances in Information Mining and Management
Pages91-96
Number of pages6
StatePublished - 2013

Bibliographical note

Place of conference:Portugal

Fingerprint

Dive into the research topics of 'REEF: Resolving Length Bias in Frequent Sequence Mining'. Together they form a unique fingerprint.

Cite this