Computing Phylogenetic Roots with Bounded Degrees and Errors

Jiang, Tao; Lin, Guohui; Chen, Zhi-Zhong

doi:doi:10.7939/R3W17X

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Computing Science, Department of / Technical Reports (Computing Science)

Usage

299 views
299 downloads

Computing Phylogenetic Roots with Bounded Degrees and Errors

Author(s) / Creator(s)
Technical report TR03-06. Given a set of species and their similarity data, an important problem in evolutionary biology is how to reconstruct a phylogeny (also called evolutionary tree) so that species are close in the phylogeny if and only if they have high similarity. Assume that the similarity data are represented as a graph G = (V, E) where each vertex represents a species and two vertices are adjacent if they represent species of high similarity. The phylogeny reconstruction problem can then be abstracted as the problem of finding a (phylogenetic) tree T from the given graph G such that (1) T has no degree-2 internal nodes, (2) the external nodes (i.e. leaves) of T are exactly the elements of V, and (3) (u, v) ε E if and only if dT(u, v) ≤ k for some fixed threshold k, where dT(u,v) denotes the distance between u and v in tree T. This is called the Phylogenetic kth Root Problem (PRk), and such a tree T, if exists, is called a phylogenetic kth root of graph G. The computational complexity of PRk is open, except for k ≤ 4. In this paper, we investigate PRk under a natural restriction that the maximum degree of the phylogenetic root is bounded from above by a constant. Our main contribution is a linear-time algorithm that determines if G has such a phylogenetic kth root, and if so, demonstrates one. On the other hand, as in practice the collected similarity data are usually not perfect and may contain errors, we propose to study a generalized version of PRk where the output phylogeny is only required to be an approximate root of the input graph. We show that this and other related problems are computationally intractable. | TRID-ID TR03-06
Date created

2003
Subjects / Keywords
Type of Item

Report
DOI

https://doi.org/10.7939/R3W17X
License

Attribution 3.0 International

Language
- English