|
|
|
Leo van Iersel,
Steven Kelk,
Giorgios Stamoulis,
Leen Stougie and
Olivier Boes. On unrooted and root-uncertain variants of several well-known phylogenetic network problems. In ALG, Vol. 80(11):2993-3022, 2018. Keywords: explicit network, FPT, from network, from unrooted trees, NP complete, phylogenetic network, phylogeny, reconstruction, tree containment. Note: https://hal.inria.fr/hal-01599716.
|
|
|
Philippe Gambette,
Andreas Gunawan,
Anthony Labarre,
Stéphane Vialette and
Louxin Zhang. Solving the Tree Containment Problem in Linear Time for Nearly Stable Phylogenetic Networks. In DAM, Vol. 246:62-79, 2018. Keywords: explicit network, from network, from rooted trees, nearly-stable network, phylogenetic network, phylogeny, polynomial, tree containment. Note: https://hal-upec-upem.archives-ouvertes.fr/hal-01575001/en/.
|
|
|
|
|
Philippe Gambette,
Katharina Huber and
Guillaume Scholz. Uprooted Phylogenetic Networks. In BMB, Vol. 79(9):2022-2048, 2017. Keywords: circular split system, explicit network, from splits, galled tree, phylogenetic network, phylogeny, polynomial, reconstruction, split network, uniqueness. Note: http://arxiv.org/abs/1511.08387.
|
|
|
Sha Zhu and
James H. Degnan. Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent. In SB, Vol. 66(2):283-298, 2017. Keywords: branch length, coalescent, explicit network, from network, likelihood, phylogenetic network, phylogeny, Program Hybrid-coal, Program Hybrid-Lambda, Program PhyloNet, software, uniqueness. Note: presentation available at https://www.youtube.com/watch?v=JLYGTfEZG7g.
|
|
|
Philippe Gambette,
Leo van Iersel,
Mark Jones,
Manuel Lafond,
Fabio Pardi and
Celine Scornavacca. Rearrangement Moves on Rooted Phylogenetic Networks. In PLoS Computational Biology, Vol. 13(8):e1005611.1-21, 2017. Keywords: distance between networks, explicit network, from network, NNI distance, NNI moves, phylogenetic network, phylogeny, SPR distance. Note: https://hal-upec-upem.archives-ouvertes.fr/hal-01572624/en/.
|
|
|
Philippe Gambette,
Andreas Gunawan,
Anthony Labarre,
Stéphane Vialette and
Louxin Zhang. Solving the Tree Containment Problem for Genetically Stable Networks in Quadratic Time. In IWOCA15, Vol. 9538:197-208 of LNCS, springer, 2016. Keywords: explicit network, from network, from rooted trees, genetically stable network, phylogenetic network, phylogeny, polynomial, tree containment. Note: https://hal-upec-upem.archives-ouvertes.fr/hal-01226035 .
|
|
|
Philippe Gambette,
Leo van Iersel,
Steven Kelk,
Fabio Pardi and
Celine Scornavacca. Do branch lengths help to locate a tree in a phylogenetic network? In BMB, Vol. 78(9):1773-1795, 2016. Keywords: branch length, explicit network, FPT, from network, from rooted trees, NP complete, phylogenetic network, phylogeny, pseudo-polynomial, time consistent network, tree containment, tree sibling network. Note: http://arxiv.org/abs/1607.06285.
|
|
|
Philippe Gambette,
Andreas Gunawan,
Anthony Labarre,
Stéphane Vialette and
Louxin Zhang. Locating a Tree in A Phylogenetic Network in Quadratic Time. In RECOMB15, Vol. 9029:96-107 of LNCS, Springer, 2015. Keywords: evaluation, explicit network, from network, from rooted trees, genetically stable network, nearly-stable network, phylogenetic network, phylogeny, polynomial, tree containment. Note: https://hal.archives-ouvertes.fr/hal-01116231/en.
|
|
|
Sha Zhu,
James H. Degnan,
Sharyn J. Goldstein and
Bjarki Eldon. Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees. In BMCB, Vol. 16(292):1-7, 2015. Keywords: explicit network, from network, phylogenetic network, phylogeny, Program Hybrid-Lambda, simulation, software. Note: http://dx.doi.org/10.1186/s12859-015-0721-y.
|
|
|
Leo van Iersel,
Steven Kelk,
Nela Lekic and
Leen Stougie. Approximation algorithms for nonbinary agreement forests. In SIDMA, Vol. 28(1):49-66, 2014. Keywords: agreement forest, approximation, from rooted trees, hybridization, minimum number, phylogenetic network, phylogeny, reconstruction. Note: http://arxiv.org/abs/1210.3211.
Toggle abstract
"Given two rooted phylogenetic trees on the same set of taxa X, the Maximum Agreement Forest (maf) problem asks to find a forest that is, in a certain sense, common to both trees and has a minimum number of components. The Maximum Acyclic Agreement Forest (maaf) problem has the additional restriction that the components of the forest cannot have conflicting ancestral relations in the input trees. There has been considerable interest in the special cases of these problems in which the input trees are required to be binary. However, in practice, phylogenetic trees are rarely binary, due to uncertainty about the precise order of speciation events. Here, we show that the general, nonbinary version of maf has a polynomial-time 4-approximation and a fixedparameter tractable (exact) algorithm that runs in O(4opoly(n)) time, where n = |X| and k is the number of components of the agreement forest minus one. Moreover, we show that a c-approximation algorithm for nonbinary maf and a d-approximation algorithm for the classical problem Directed Feedback Vertex Set (dfvs) can be combined to yield a d(c+3)-approximation for nonbinary maaf. The algorithms for maf have been implemented and made publicly available. © 2014 Society for Industrial and Applied Mathematics."
|
|
|
|
|
Eric Bapteste,
Leo van Iersel,
Axel Janke,
Scott Kelchner,
Steven Kelk,
James O. McInerney,
David A. Morrison,
Luay Nakhleh,
Mike Steel,
Leen Stougie and
James B. Whitfield. Networks: expanding evolutionary thinking. In Trends in Genetics, Vol. 29(8):439-441, 2013. Keywords: abstract network, explicit network, phylogenetic network, phylogeny, reconstruction. Note: http://bioinf.nuim.ie/wp-content/uploads/2013/06/Bapteste-TiG-2013.pdf.
Toggle abstract
"Networks allow the investigation of evolutionary relationships that do not fit a tree model. They are becoming a leading tool for describing the evolutionary relationships between organisms, given the comparative complexities among genomes. © 2013 Elsevier Ltd."
|
|
|
|
|
Philippe Gambette and
Katharina Huber. On Encodings of Phylogenetic Networks of Bounded Level. In JOMB, Vol. 65(1):157-180, 2012. Keywords: characterization, explicit network, from clusters, from rooted trees, from triplets, galled tree, identifiability, level k phylogenetic network, phylogenetic network, uniqueness, weak hierarchy. Note: http://hal.archives-ouvertes.fr/hal-00609130/en/.
Toggle abstract
"Phylogenetic networks have now joined phylogenetic trees in the center of phylogenetics research. Like phylogenetic trees, such networks canonically induce collections of phylogenetic trees, clusters, and triplets, respectively. Thus it is not surprising that many network approaches aim to reconstruct a phylogenetic network from such collections. Related to the well-studied perfect phylogeny problem, the following question is of fundamental importance in this context: When does one of the above collections encode (i. e. uniquely describe) the network that induces it? For the large class of level-1 (phylogenetic) networks we characterize those level-1 networks for which an encoding in terms of one (or equivalently all) of the above collections exists. In addition, we show that three known distance measures for comparing phylogenetic networks are in fact metrics on the resulting subclass and give the diameter for two of them. Finally, we investigate the related concept of indistinguishability and also show that many properties enjoyed by level-1 networks are not satisfied by networks of higher level. © 2011 Springer-Verlag."
|
|
|
|
|
Steven Kelk,
Leo van Iersel,
Nela Lekic,
Simone Linz,
Celine Scornavacca and
Leen Stougie. Cycle killer... qu'est-ce que c'est? On the comparative approximability of hybridization number and directed feedback vertex set. In SIDMA, Vol. 26(4):1635-1656, 2012. Keywords: agreement forest, approximation, explicit network, from rooted trees, minimum number, phylogenetic network, phylogeny, Program CycleKiller, reconstruction. Note: http://arxiv.org/abs/1112.5359, about the title.
Toggle abstract
"We show that the problem of computing the hybridization number of two rooted binary phylogenetic trees on the same set of taxa X has a constant factor polynomial-time approximation if and only if the problem of computing a minimum-size feedback vertex set in a directed graph (DFVS) has a constant factor polynomial-time approximation. The latter problem, which asks for a minimum number of vertices to be removed from a directed graph to transform it into a directed acyclic graph, is one of the problems in Karp's seminal 1972 list of 21 NP-complete problems. Despite considerable attention from the combinatorial optimization community, it remains to this day unknown whether a constant factor polynomial-time approximation exists for DFVS. Our result thus places the (in)approximability of hybridization number in a much broader complexity context, and as a consequence we obtain that it inherits inapproximability results from the problem Vertex Cover. On the positive side, we use results from the DFVS literature to give an O(log r log log r) approximation for the hybridization number where r is the correct value. Copyright © by SIAM."
|
|
|
Philippe Gambette,
Vincent Berry and
Christophe Paul. Quartets and Unrooted Phylogenetic Networks. In JBCB, Vol. 10(4):1250004, 2012. Keywords: abstract network, circular split system, explicit network, from quartets, level k phylogenetic network, orientation, phylogenetic network, phylogeny, polynomial, reconstruction, split, split network. Note: http://hal.archives-ouvertes.fr/hal-00678046/en/.
Toggle abstract
"Phylogenetic networks were introduced to describe evolution in the presence of exchanges of genetic material between coexisting species or individuals. Split networks in particular were introduced as a special kind of abstract network to visualize conflicts between phylogenetic trees which may correspond to such exchanges. More recently, methods were designed to reconstruct explicit phylogenetic networks (whose vertices can be interpreted as biological events) from triplet data. In this article, we link abstract and explicit networks through their combinatorial properties, by introducing the unrooted analog of level-k networks. In particular, we give an equivalence theorem between circular split systems and unrooted level-1 networks. We also show how to adapt to quartets some existing results on triplets, in order to reconstruct unrooted level-k phylogenetic networks. These results give an interesting perspective on the combinatorics of phylogenetic networks and also raise algorithmic and combinatorial questions. © 2012 Imperial College Press."
|
|
|
|
|
Josh Voorkamp né Collins,
Simone Linz and
Charles Semple. Quantifying hybridization in realistic time. In JCB, Vol. 18(10):1305-1318, 2011. Keywords: explicit network, FPT, from rooted trees, hybridization, minimum number, phylogenetic network, phylogeny, Program HybridInterleave, reconstruction, software. Note: http://wwwcsif.cs.ucdavis.edu/~linzs/CLS10_interleave.pdf, software available at http://www.math.canterbury.ac.nz/~c.semple/software.shtml.
Toggle abstract
"Recently, numerous practical and theoretical studies in evolutionary biology aim at calculating the extent to which reticulation-for example, horizontal gene transfer, hybridization, or recombination-has influenced the evolution for a set of present-day species. It has been shown that inferring the minimum number of hybridization events that is needed to simultaneously explain the evolutionary history for a set of trees is an NP-hard and also fixed-parameter tractable problem. In this article, we give a new fixed-parameter algorithm for computing the minimum number of hybridization events for when two rooted binary phylogenetic trees are given. This newly developed algorithm is based on interleaving-a technique using repeated kernelization steps that are applied throughout the exhaustive search part of a fixed-parameter algorithm. To show that our algorithm runs efficiently to be applicable to a wide range of practical problem instances, we apply it to a grass data set and highlight the significant improvements in terms of running times in comparison to an algorithm that has previously been implemented. © 2011, Mary Ann Liebert, Inc."
|
|
|
Philippe Gambette. Méthodes combinatoires de reconstruction de réseaux phylogénétiques. PhD thesis, Université Montpellier 2, France, 2010. Keywords: abstract network, characterization, circular split system, explicit network, FPT, from clusters, from triplets, integer linear programming, level k phylogenetic network, NP complete, phylogenetic network, phylogeny, Program Dendroscope, pyramid, reconstruction, split network, weak hierarchy. Note: http://tel.archives-ouvertes.fr/tel-00608342/en/.
|
|
|
|
|
Philippe Gambette,
Vincent Berry and
Christophe Paul. The structure of level-k phylogenetic networks. In CPM09, Vol. 5577:289-300 of LNCS, springer, 2009. Keywords: coalescent, explicit network, galled tree, level k phylogenetic network, phylogenetic network, Program Recodon. Note: http://hal-lirmm.ccsd.cnrs.fr/lirmm-00371485/en/.
Toggle abstract
"Evolution is usually described as a phylogenetic tree, but due to some exchange of genetic material, it can be represented as a phylogenetic network which has an underlying tree structure. The notion of level was recently introduced as a parameter on realistic kinds of phylogenetic networks to express their complexity and tree-likeness. We study the structure of level-k networks, and how they can be decomposed into level-k generators. We also provide a polynomial time algorithm which takes as input the set of level-k generators and builds the set of level-(k + 1) generators. Finally, with a simulation study, we evaluate the proportion of level-k phylogenetic networks among networks generated according to the coalescent model with recombination. © 2009 Springer Berlin Heidelberg."
|
|
|
Daniel H. Huson,
Regula Rupp,
Vincent Berry,
Philippe Gambette and
Christophe Paul. Computing Galled Networks from Real Data. In ISMBECCB09, Vol. 25(12):i85-i93 of BIO, 2009. Keywords: abstract network, cluster containment, explicit network, FPT, from clusters, from rooted trees, galled network, NP complete, phylogenetic network, phylogeny, polynomial, Program Dendroscope, reconstruction. Note: http://hal-lirmm.ccsd.cnrs.fr/lirmm-00368545/en/.
Toggle abstract
"Motivation: Developing methods for computing phylogenetic networks from biological data is an important problem posed by molecular evolution and much work is currently being undertaken in this area. Although promising approaches exist, there are no tools available that biologists could easily and routinely use to compute rooted phylogenetic networks on real datasets containing tens or hundreds of taxa. Biologists are interested in clades, i.e. groups of monophyletic taxa, and these are usually represented by clusters in a rooted phylogenetic tree. The problem of computing an optimal rooted phylogenetic network from a set of clusters, is hard, in general. Indeed, even the problem of just determining whether a given network contains a given cluster is hard. Hence, some researchers have focused on topologically restricted classes of networks, such as galled trees and level-k networks, that are more tractable, but have the practical draw-back that a given set of clusters will usually not possess such a representation. Results: In this article, we argue that galled networks (a generalization of galled trees) provide a good trade-off between level of generality and tractability. Any set of clusters can be represented by some galled network and the question whether a cluster is contained in such a network is easy to solve. Although the computation of an optimal galled network involves successively solving instances of two different NP-complete problems, in practice our algorithm solves this problem exactly on large datasets containing hundreds of taxa and many reticulations in seconds, as illustrated by a dataset containing 279 prokaryotes. © 2009 The Author(s)."
|
|
|
|
|
Josh Voorkamp né Collins. Rekernelisation Algorithms in Hybrid Phylogenies. Master's thesis, University of Canterbury, New Zealand, 2009. Keywords: agreement forest, explicit network, FPT, from rooted trees, from unrooted trees, hybridization, minimum number, phylogenetic network, phylogeny, Program HybridInterleave, reconstruction, software. Note: http://hdl.handle.net/10092/2852.
|
|
|
Mark A. Ragan. Trees and networks before and after Darwin. In Biology Direct, Vol. 4(43), 2009. Keywords: abstract network, explicit network, phylogenetic network, phylogeny, survey, visualization. Note: http://dx.doi.org/10.1186/1745-6150-4-43.
Toggle abstract
"It is well-known that Charles Darwin sketched abstract trees of relationship in his 1837 notebook, and depicted a tree in the Origin of Species (1859). Here I attempt to place Darwin's trees in historical context. By the mid-Eighteenth century the Great Chain of Being was increasingly seen to be an inadequate description of order in nature, and by about 1780 it had been largely abandoned without a satisfactory alternative having been agreed upon. In 1750 Donati described aquatic and terrestrial organisms as forming a network, and a few years later Buffon depicted a network of genealogical relationships among breeds of dogs. In 1764 Bonnet asked whether the Chain might actually branch at certain points, and in 1766 Pallas proposed that the gradations among organisms resemble a tree with a compound trunk, perhaps not unlike the tree of animal life later depicted by Eichwald. Other trees were presented by Augier in 1801 and by Lamarck in 1809 and 1815, the latter two assuming a transmutation of species over time. Elaborate networks of affinities among plants and among animals were depicted in the late Eighteenth and very early Nineteenth centuries. In the two decades immediately prior to 1837, so-called affinities and/or analogies among organisms were represented by diverse geometric figures. Series of plant and animal fossils in successive geological strata were represented as trees in a popular textbook from 1840, while in 1858 Bronn presented a system of animals, as evidenced by the fossil record, in a form of a tree. Darwin's 1859 tree and its subsequent elaborations by Haeckel came to be accepted in many but not all areas of biological sciences, while network diagrams were used in others. Beginning in the early 1960s trees were inferred from protein and nucleic acid sequences, but networks were re-introduced in the mid-1990s to represent lateral genetic transfer, increasingly regarded as a fundamental mode of evolution at least for bacteria and archaea. In historical context, then, the Network of Life preceded the Tree of Life and might again supersede it. Reviewers: This article was reviewed by Eric Bapteste, Patrick Forterre and Dan Graur. © 2009 Ragan; licensee BioMed Central Ltd."
|
|
|
Ali Tofigh. Using Trees to Capture Reticulate Evolution, Lateral Gene Transfers and Cancer Progression. PhD thesis, KTH Royal Institute of Technology, Sweden, 2009. Keywords: duplication, dynamic programming, from multilabeled tree, from rooted trees, from species tree, lateral gene transfer, loss, NP complete, phylogenetic network, phylogeny, reconstruction. Note: http://kth.diva-portal.org/smash/record.jsf?pid=diva2:220830&searchId=1.
|
|
|
|
|
Philippe Gambette and
Daniel H. Huson. Improved Layout of Phylogenetic Networks. In TCBB, Vol. 5(3):472-479, 2008. Keywords: abstract network, heuristic, phylogenetic network, phylogeny, Program SplitsTree, software, split network, visualization. Note: http://hal-lirmm.ccsd.cnrs.fr/lirmm-00309694/en/.
Toggle abstract
"Split networks are increasingly being used in phylogenetic analysis. Usually, a simple equal-angle algorithm is used to draw such networks, producing layouts that leave much room for improvement. Addressing the problem of producing better layouts of split networks, this paper presents an algorithm for maximizing the area covered by the network, describes an extension of the equal-daylight algorithm to networks, looks into using a spring embedder, and discusses how to construct rooted split networks. © 2008 IEEE."
|
|
|
Leo van Iersel,
Judith Keijsper,
Steven Kelk,
Leen Stougie,
Ferry Hagen and
Teun Boekhout. Constructing level-2 phylogenetic networks from triplets. In RECOMB08, Vol. 4955:450-462 of LNCS, springer, 2008. Keywords: explicit network, from triplets, level k phylogenetic network, NP complete, phylogenetic network, phylogeny, polynomial, Program Level2, reconstruction. Note: http://homepages.cwi.nl/~iersel/level2full.pdf. An appendix with proofs can be found here http://arxiv.org/abs/0707.2890.
Toggle abstract
"Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of taxa), it is possible to determine in polynomial time whether there exists a level-1 network consistent with T, and if so, to construct such a network [24]. Here, we extend this work by showing that this problem is even polynomial time solvable for the construction of level-2 networks. This shows that, assuming density, it is tractable to construct plausible evolutionary histories from input triplets even when such histories are heavily nontree-like. This further strengthens the case for the use of triplet-based methods in the construction of phylogenetic networks. We also implemented the algorithm and applied it to yeast data. © 2009 IEEE."
|
|
|
David Bryant,
Vincent Moulton and
Andreas Spillner. Consistency of the Neighbor-Net Algorithm. In AMB, Vol. 2(8), 2007. Keywords: abstract network, consistency, from distances, NeighborNet. Note: http://dx.doi.org/10.1186/1748-7188-2-8.
Toggle abstract
"Background: Neighbor-Net is a novel method for phylogenetic analysis that is currently being widely used in areas such as virology, bacteriology, and plant evolution. Given an input distance matrix, Neighbor-Net produces a phylogenetic network, a generalization of an evolutionary or phylogenetic tree which allows the graphical representation of conflicting phylogenetic signals. Results: In general, any network construction method should not depict more conflict than is found in the data, and, when the data is fitted well by a tree, the method should return a network that is close to this tree. In this paper we provide a formal proof that Neighbor-Net satisfies both of these requirements so that, in particular, Neighbor-Net is statistically consistent on circular distances. © 2007 Bryant et al; licensee BioMed Central Ltd."
|
|
|
Hans-Jürgen Bandelt and
Arne Dür. Translating DNA data tables into quasi-median networks for parsimony analysis and error detection. In MPE, Vol. 42(1):256-271, 2007. Keywords: abstract network, from sequences, parsimony, phylogenetic network, phylogeny, quasi-median network, reconstruction. Note: http://dx.doi.org/10.1016/j.ympev.2006.07.013.
Toggle abstract
"Every DNA data table can be turned into a quasi-median network that faithfully represents the data. We show that for (weighted) condensed data tables the associated network harbors all most parsimonious reconstructions for any tree that connects the sampled haplotypes. Structural features of this network can be computed directly from the data table. The key principle repeatedly used is that the quasi-median network is uniquely determined by the sub-tables for pairs of characters. The translation of a table into a network enhances the understanding of the properties of the data in regard to homoplasy and potential artifacts. The total number of nodes of such a network measures the complexity of the data. In particular, networks that display the results of filter analyses by which hotspot mutations are removed help to detect data idiosyncrasies and thus pinpoint sequencing problems. A pertinent example drawn from human mtDNA illustrates these points. © 2006 Elsevier Inc. All rights reserved."
|
|
|
|
|
David Bryant. Extending tree models to splits networks. In
Lior Pachter and
Bernd Sturmfels editors, Algebraic Statistics for Computational Biology, Pages 322-334, Cambridge University Press, 2005. Keywords: abstract network, from splits, likelihood, phylogenetic network, phylogeny, split, split network, statistical model. Note: http://www.math.auckland.ac.nz/~bryant/Papers/05ascbChapter.pdf.
|
|
|
Hans-Jürgen Bandelt,
Vincent Macaulay and
Martin Richards. Median networks: speedy construction and greedy reduction, one simulation, and two case studies from human mtDNA. In MPE, Vol. 16:8-28, 2000. Keywords: from sequences, from splits, median network, phylogenetic network, phylogeny, reconstruction. Note: http://www.stats.gla.ac.uk/~vincent/papers/speedy.pdf.
Toggle abstract
"Molecular data sets characterized by few phylogenetically informative characters with a broad spectrum of mutation rates, such as intraspecific control-region sequence variation of human mitochondrial DNA (mtDNA), can be usefully visualized in the form of median networks. Here we provide a step-by-step guide to the construction of such networks by hand. We improve upon a previously implemented algorithm by outlining an efficient parametrized strategy amenable to large data sets, greedy reduction, which makes it possible to reconstruct some of the confounding recurrent mutations. This entails some postprocessing as well, which assists in capturing more parsimonious solutions. To simplify the creation of the resulting network by hand, we describe a speedy approach to network construction, based on a careful planning of the processing order. A coalescent simulation tailored to human mtDNA variation in Eurasia testifies to the usefulness of reduced median networks, while highlighting notorious problems faced by all phylogenetic methods in this context. Finally, we discuss two case studies involving the comparison of characters in the two hypervariable segments of the human mtDNA control region in the light of the worldwide control-region sequence database, as well as additional restriction fragment length polymorphism information. We conclude that only a minority of the mutations that hit the second segment occur at sites that would have a mutation rate comparable to those at most sites in the first segment. Discarding the known 'noisy' sites of the second segment enhances the analysis. (C) 2000 Academic Press."
|
|
|
|
|
|
|
|
|
|
|
|
|
Hans-Jürgen Bandelt. Phylogenetic Networks. In Verhandlungen des Naturwissenschaftlichen Vereins Hamburg, Vol. 34:51-71, 1994.
|
|
|
Hans-Jürgen Bandelt and
Andreas W. M. Dress. An order theoretic framework for overlapping clustering. In DM, Vol. 136(1-3):21-37, 1994.
Toggle abstract
"Cluster analysis deals with procedures which - given a finite collection X of objects together with some kind of local dissimilarity information - identify those subcollections C of objects from X, called clusters, which exhibit a comparatively low degree of internal dissimilarity. In this note we study arbitrary mappings φ which assign to each subcollection A ⊆ X of objects its internal degree of dissimilarity φ (A), subject only to the natural condition that A ⊆ B ⊆ X implies φ (A) ̌ φ (B), and we analyse on a rather abstract, purely order theoretic level how assumptions concerning the way such a mapping φ might be constructed from local data (that is, data involving only a few objects at a time) influence the degree of overlapping observed within the resulting family of clusters, - and vice versa. Hence, unlike previous order theoretic approaches to cluster analysis, we do not restrict our attention to nonoverlapping, hierarchical clustering. Instead, we regard a dissimilarity function φ as an arbitrary isotone mapping from a finite partially ordered set I - e.g. the set P(X) of all subsets A of a finite set X - into a (partially) ordered set R - e.g. the nonnegative real numbers - and we study the correspondence between the two subsets C(φ) and D(φ) of I, formed by the elements whose images are inaccessible from above and from below, respectively. While D(φ) constitutes the local data structure from which φ can be built up, C(φ) embodies the family of clusters associated with φ. Our results imply that in case I: = P(X) and R: = R≥0 one has # D ̌ n for all Dε{lunate}D(φ) and some fixed nε{lunate}N if and only if{A figure is presented} for all C0,..., Cnε{lunate}C(φ) if and only if this holds for all subsets C0,..., Cn ⊆ X, generalizing a well-known criterion for n-conformity of hypergraphs as well as corresponding results due to Batbedat, dealing with the case n = 2. © 1994."
|
|
|
Hans-Jürgen Bandelt and
Andreas W. M. Dress. A relational approach to split decomposition. In
H.-H. Bock,
W. Lenski and
M. M. Richter editors, Information Systems and Data Analysis, Proceedings of the 17th Annual Conference of the Gesellschaft Für Klassifikation (GFKL93), Vol. 42:123-131 of Studies in Classification, Data Analysis, and Knowledge Organization, springer, 1994. Keywords: characterization, from quartets, phylogenetic network, weakly compatible.
|
|
|
Hans-Jürgen Bandelt and
Andreas W. M. Dress. A canonical decomposition theory for metrics on a finite set. In Advances in Mathematics, Vol. 92(1):47-105, 1992. Keywords: abstract network, circular split system, from distances, split, split decomposition, split network, weak hierarchy, weakly compatible.
Toggle abstract
"We consider specific additive decompositions d = d1 + ... + dn of metrics, defined on a finite set X (where a metric may give distance zero to pairs of distinct points). The simplest building stones are the slit metrics, associated to splits (i.e., bipartitions) of the given set X. While an additive decomposition of a Hamming metric into split metrics is in no way unique, we achieve uniqueness by restricting ourselves to coherent decompositions, that is, decompositions d = d1 + ... + dn such that for every map f:X → R with f(x) + f(y) ≥ d(x, y) for all x, y ε{lunate} X there exist maps f1, ..., fn: X → R with f = f1 + ... + fn and fi(x) + fi(y) ≥ di(x, y) for all i = 1,..., n and all x, y ε{lunate} X. These coherent decompositions are closely related to a geometric decomposition of the injective hull of the given metric. A metric with a coherent decomposition into a (weighted) sum of split metrics will be called totally split-decomposable. Tree metrics (and more generally, the sum of two tree metrics) are particular instances of totally split-decomposable metrics. Our main result confirms that every metric admits a coherent decomposition into a totally split-decomposable metric and a split-prime residue, where all the split summands and hence the decomposition can be determined in polynomial time, and that a family of splits can occur this way if and only if it does not induce on any four-point subset all three splits with block size two. © 1992."
|
|
|
|
|
|
|
Hans-Jürgen Bandelt and
Andreas W. M. Dress. Weak hierarchies associated with similarity measures: an additive clustering technique. In BMB, Vol. 51:113-166, 1989. Keywords: abstract network, clustering, from distances, from trees, phylogenetic network, phylogeny, Program WeakHierarchies, reconstruction, weak hierarchy. Note: http://dx.doi.org/10.1007/BF02458841.
Toggle abstract
"A new and apparently rather useful and natural concept in cluster analysis is studied: given a similarity measure on a set of objects, a sub-set is regarded as a cluster if any two objects a, b inside this sub-set have greater similarity than any third object outside has to at least one of a, b. These clusters then form a closure system which can be described as a hypergraph without triangles. Conversely, given such a system, one may attach some weight to each cluster and then compose a similarity measure additively, by letting the similarity of a pair be the sum of weights of the clusters containing that particular pair. The original clusters can be reconstructed from the obtained similarity measure. This clustering model is thus located between the general additive clustering model of Shepard and Arabie (1979) and the standard hierarchical model. Potential applications include fitting dendrograms with few additional nonnested clusters and simultaneous representation of some families of multiple dendrograms (in particular, two-dendrogram solutions), as well as assisting the search for phylogenetic relationships by proposing a somewhat larger system of possibly relevant "family groups", from which an appropriate choice (based on additional insight or individual preferences) remains to be made. © 1989 Society for Mathematical Biology."
|
|
|