CUBS: Multivariate sequence classification using bounded Z-Score with sampling

Ariella Richardson, Gal Kaminka, Sarit Kraus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Multivariate temporal sequence classification is an important and challenging task. Several attempts to address this problem exist, but none provide a full solution. In this paper we present CUBS: Classification Using Bounded Z-Score with Sampling. CUBS uses itemset mining to produce frequent subsequences, and then selects among them the statistically significant subsequences to compose a classification model. We introduce an improved itemset mining algorithm that solves the short sequence bias present in many itemset mining algorithms. Unfortunately, the z-score normalization hinders pruning. We provide a bound on the z-score to address this issue. Calculation of the z-score normalization requires knowledge of some statistical values of the data gathered using a small sample of the database. The sampling causes a distortion in the values. We analyze this distortion and correct it.We evaluate CUBS for accuracy and scalability on a synthetic dataset and on two real world dataset. The results demonstrate how short subsequence bias is solved in the mining, and show how our bound and sampling technique enable speedup.

Original languageEnglish
Title of host publicationProceedings - 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Pages72-79
Number of pages8
DOIs
StatePublished - 2010
Event10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 - Sydney, NSW, Australia
Duration: 14 Dec 201017 Dec 2010

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference10th IEEE International Conference on Data Mining Workshops, ICDMW 2010
Country/TerritoryAustralia
CitySydney, NSW
Period14/12/1017/12/10

Keywords

  • Classification
  • Mining multiple information sources
  • Multivariate sequence mining
  • Sampling

Fingerprint

Dive into the research topics of 'CUBS: Multivariate sequence classification using bounded Z-Score with sampling'. Together they form a unique fingerprint.

Cite this