Session: Heuristics, Metaheuristics and Hyper-Heuristics II (06/08, 11:15-13:15, Room 7)

Variable Neighborhood Search for the Large Phylogeny Problem using Gene Order Data



Computing evolutionary distances using gene order data is a complex combinatory problem; nevertheless, for specific metrics exact polynomial algorithms were proposed, having in many cases non trivial approaches. This scenario can become harder if we want to reconstruct phylogenies based on gene order data: first it is necessary to explore the search space of possible tree structures which is well- known to be exponential; second, it is necessary a method for evaluating the cost of these trees, i.e. to find a labeling of the internal nodes that leads to the most parsimonious cost of a tree under a given evolutionary distance. The latter problem was shown to be NP-hard even for 3 genomes (median problem) under many evolutionary distances. In this paper we propose a variable neighborhood search approach for solving the large phylogeny problem for data based on gene orders. Also, a greedy approach is proposed for the small phylogeny problem aiming to reduce the running time of the Kovac et al. dynamic programming approach. Our proposed algorithms were implemented as the software called HELPHY. Experiments showed that the running time is improved for finding trees with good scores (reversal distance) for the Campanulaceae dataset, and a new tree structure was found having the best known score (double cut and join distance) for the case of Hemiascomycetes dataset.