TY - JOUR

T1 - Designing an A* algorithm for calculating edit distance between rooted-unordered trees

AU - Horesh, Yair

AU - Mehr, Ramit

AU - Unger, Ron

PY - 2006/7

Y1 - 2006/7

N2 - Tree structures are useful for describing and analyzing biological objects and processes. Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the "Tree Edit Distance," the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significant, can be compared in polynomial time, Rooted-unordered trees are used to describe processes or objects where the topology, rather than the order or the identity of each node, is important For example, in immunology, rooted-unordered trees describe the process of immunoglobulin (antibody) gene diversification in the germinal center over time. Comparing such trees has been proven to be a difficult computational problem that belongs to the set of NP-CompIete problems. Comparing two trees can be viewed as a search problem in graphs. A" is a search algorithm that explores the search space in an efficient order. Using a good lower bound estimation of the degree of difference between the two trees, A" can reduce search time dramatically. We have designed and implemented a variant of the A" search algorithm suitable for calculating tree edit distance. We show here that A" is able to perform an edit distance measurement in reasonable time for trees with dozens of nodes.

AB - Tree structures are useful for describing and analyzing biological objects and processes. Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the "Tree Edit Distance," the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significant, can be compared in polynomial time, Rooted-unordered trees are used to describe processes or objects where the topology, rather than the order or the identity of each node, is important For example, in immunology, rooted-unordered trees describe the process of immunoglobulin (antibody) gene diversification in the germinal center over time. Comparing such trees has been proven to be a difficult computational problem that belongs to the set of NP-CompIete problems. Comparing two trees can be viewed as a search problem in graphs. A" is a search algorithm that explores the search space in an efficient order. Using a good lower bound estimation of the degree of difference between the two trees, A" can reduce search time dramatically. We have designed and implemented a variant of the A" search algorithm suitable for calculating tree edit distance. We show here that A" is able to perform an edit distance measurement in reasonable time for trees with dozens of nodes.

KW - A

KW - Lineage trees

KW - Rooted-unordered trees

KW - Tree edit distance

UR - http://www.scopus.com/inward/record.url?scp=33749246866&partnerID=8YFLogxK

U2 - 10.1089/cmb.2006.13.1165

DO - 10.1089/cmb.2006.13.1165

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

C2 - 16901235

AN - SCOPUS:33749246866

SN - 1066-5277

VL - 13

SP - 1165

EP - 1176

JO - Journal of Computational Biology

JF - Journal of Computational Biology

IS - 6

ER -