Efficient construction of decision trees by the dual information distance method

Irad Ben-Gal, Alexandra Dana, Niv Shkolnik, Gonen Singer

Research output: Contribution to journalArticlepeer-review

32 Scopus citations

Abstract

The construction of efficient decision and classification trees is a fundamental task in Big Data analytics which is known to be NP-hard. Accordingly, many greedy heuristics were suggested for the construction of decision-trees, but were found to result in local-optimum solutions. In this work we present the dual information distance (DID) method for efficient construction of decision trees that is computationally attractive, yet relatively robust to noise. The DID heuristic selects features by considering both their immediate contribution to the classification, as well as their future potential effects. It represents the construction of classification Uees by finding the shortest paths over a graph of partitions that are denned by the selected features. The DID method takes into account both the orthogonality between the selected partitions, as well as the reduction of uncertainty on the class partition given the selected attributes. We show that the DID method often outperforms popular classifiers, in terms of average depth and classification accuracy.

Original languageEnglish
Pages (from-to)133-147
Number of pages15
JournalQuality Technology and Quantitative Management
Volume11
Issue number1
DOIs
StatePublished - Mar 2014
Externally publishedYes

Keywords

  • Average path length
  • Big-data
  • C4.5
  • Decision trees
  • Online classifiers

Fingerprint

Dive into the research topics of 'Efficient construction of decision trees by the dual information distance method'. Together they form a unique fingerprint.

Cite this