TY - JOUR

T1 - A vectorial tree distance measure

AU - Priel, Avner

AU - Tamir, Boaz

N1 - Publisher Copyright:
© 2022, The Author(s).

PY - 2022/3/28

Y1 - 2022/3/28

N2 - A vectorial distance measure for trees is presented. Given two trees, we define a Tree-Alignment (T-Alignment). We T-align the trees from their centers outwards, starting from the root-branches, to make the next level as similar as possible. The algorithm is recursive; condition on the T-alignment of the root-branches we T-align the sub-branches, thereafter each T-alignment is conditioned on the previous one. We define a minimal T-alignment under a lexicographic order which follows the intuition that the differences between the two trees constitutes a vector. Given such a minimal T-alignment, the difference in the number of branches calculated at any level defines the entry of the distance vector at that level. We compare our algorithm to other well-known tree distance measures in the task of clustering sets of phylogenetic trees. We use the TreeSimGM simulator for generating stochastic phylogenetic trees. The vectorial tree distance (VTD) can successfully separate symmetric from asymmetric trees, and hierarchical from non-hierarchical trees. We also test the algorithm as a classifier of phylogenetic trees extracted from two members of the fungi kingdom, mushrooms and mildews, thus showimg that the algorithm can separate real world phylogenetic trees. The Matlab code can be accessed via: https://gitlab.com/avner.priel/vectorial-tree-distance.

AB - A vectorial distance measure for trees is presented. Given two trees, we define a Tree-Alignment (T-Alignment). We T-align the trees from their centers outwards, starting from the root-branches, to make the next level as similar as possible. The algorithm is recursive; condition on the T-alignment of the root-branches we T-align the sub-branches, thereafter each T-alignment is conditioned on the previous one. We define a minimal T-alignment under a lexicographic order which follows the intuition that the differences between the two trees constitutes a vector. Given such a minimal T-alignment, the difference in the number of branches calculated at any level defines the entry of the distance vector at that level. We compare our algorithm to other well-known tree distance measures in the task of clustering sets of phylogenetic trees. We use the TreeSimGM simulator for generating stochastic phylogenetic trees. The vectorial tree distance (VTD) can successfully separate symmetric from asymmetric trees, and hierarchical from non-hierarchical trees. We also test the algorithm as a classifier of phylogenetic trees extracted from two members of the fungi kingdom, mushrooms and mildews, thus showimg that the algorithm can separate real world phylogenetic trees. The Matlab code can be accessed via: https://gitlab.com/avner.priel/vectorial-tree-distance.

UR - http://www.scopus.com/inward/record.url?scp=85127228678&partnerID=8YFLogxK

U2 - 10.1038/s41598-022-08360-4

DO - 10.1038/s41598-022-08360-4

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

C2 - 35347186

AN - SCOPUS:85127228678

SN - 2045-2322

VL - 12

JO - Scientific Reports

JF - Scientific Reports

IS - 1

M1 - 5256

ER -