Designing an A* algorithm for calculating edit distance between rooted-unordered trees

Yair Horesh, Ramit Mehr, Ron Unger

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

Tree structures are useful for describing and analyzing biological objects and processes. Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the "Tree Edit Distance," the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significant, can be compared in polynomial time, Rooted-unordered trees are used to describe processes or objects where the topology, rather than the order or the identity of each node, is important For example, in immunology, rooted-unordered trees describe the process of immunoglobulin (antibody) gene diversification in the germinal center over time. Comparing such trees has been proven to be a difficult computational problem that belongs to the set of NP-CompIete problems. Comparing two trees can be viewed as a search problem in graphs. A" is a search algorithm that explores the search space in an efficient order. Using a good lower bound estimation of the degree of difference between the two trees, A" can reduce search time dramatically. We have designed and implemented a variant of the A" search algorithm suitable for calculating tree edit distance. We show here that A" is able to perform an edit distance measurement in reasonable time for trees with dozens of nodes.

Original languageEnglish
Pages (from-to)1165-1176
Number of pages12
JournalJournal of Computational Biology
Volume13
Issue number6
DOIs
StatePublished - Jul 2006

Keywords

  • A*
  • Lineage trees
  • Rooted-unordered trees
  • Tree edit distance

Fingerprint

Dive into the research topics of 'Designing an A* algorithm for calculating edit distance between rooted-unordered trees'. Together they form a unique fingerprint.

Cite this