T-REX (Tree and Reticulogram Reconstruction)[1][2] is a freely available web server, developed at the department of Computer Science of the Université du Québec à Montréal, dedicated to the inference, validation and visualization of phylogenetic trees and phylogenetic networks. The T-REX web server[1][2] allows the users to perform several popular methods of phylogenetic analysis as well as some new phylogenetic applications for inferring, drawing and validating phylogenetic trees and networks.

Phylogenetic inference

edit

The following methods for inferring and validating phylogenetic trees using distances are available: Neighbor joining (NJ), NINJA large-scale Neighbor Joining, BioNJ, UNJ, ADDTREE, MW, FITCH and Circular order reconstruction. For the maximum parsimony: DNAPARS, PROTPARS, PARS and DOLLOP, all of them from the PHYLIP package, and for the maximum likelihood: PhyML,[3] RAxML,[4] DNAML, DNAMLK, PROML and PROMLK, the four latter methods are from the PHYLIP package, are available.

Tree drawing

edit

Hierarchical vertical, horizontal, radial and axial types of tree drawing are available.

Input data can be in the three following formats: Newick format, PHYLIP and FASTA format. All graphical results provided by the T-REX server can be saved in the SVG (Scalable Vector Graphics) format and then opened and modified (e.g. prepared for a publication or presentation) in the user’s preferred graphics editor.

Tree building

edit

A developed application for drawing phylogenetic trees allowing for saving them in the Newick format.

Tree inference from incomplete matrices

edit

The following methods for reconstructing phylogenetic trees from a distance matrix containing missing values, i.e. incomplete matrices, are available: Triangles method by Guénoche and Leclerc (2001), Ultrametric procedure for the estimation of missing values by Landry, Lapointe and Kirsch (1996) followed by NJ, Additive procedure for the estimation of missing values by Landry, Lapointe and Kirsch (1996) followed by NJ, and the Modified Weighted least-squares method (MW*) by Makarenkov and Lapointe (2004). The MW* method assigns the weight of 1 to the existing entries, the weight of 0.5 to the estimated entries and the weight of 0 when the entry estimation was impossible. The simulations described in (Makarenkov and Lapointe 2004) showed that the MW* method clearly outperforms the Triangles, Ultrametric and Additive procedures.

Horizontal gene transfer detection

edit

Complete and partial Horizontal gene transfer detection and validation methods are included in the T-REX server. The HGT-Detection program[5] aims to determine an optimal, i.e. minimum-cost, scenario of horizontal gene transfers while proceeding by a gradual reconciliation of the given species and gene trees.

Reticulogram inference

edit

The reticulogram i.e. reticulated network reconstruction program first builds a supporting phylogenetic tree using one of the existing tree inferring methods. Following this, a reticulation branch that minimizes the least-square or the weighted least-square objective function is added to the tree (or network starting from Step 2) at each step of the algorithm.[6] Two statistical criteria, Q1 and Q2, have been proposed in order to measure the gain in fit provided by each reticulation branch.

The web server version of T-REX also provides the possibility of inferring the supporting tree from one distance matrix and then for adding reticulation branches using another distance matrix. Such an algorithm can be useful for depicting morphological or genetic similarities among given species or for identifying HGT events by using the first distance matrix to infer the species tree and the second matrix (containing the gene-related distances) to infer the reticulation branches representing putative horizontal gene transfers [6] .[7]

Sequence alignment

edit

MAFFT, MUSCLE (alignment software) and ClustalW, which are among the most widely used multiple sequence alignment tools, are available with slow and fast pairwise alignment options.

Substitution models (sequence to distance transformation)

edit

The following popular substitution models of DNA and amino acids evolution, allowing for estimating evolutionary distances from sequence data, have been included to T-REX: Uncorrected distance, Jukes-Cantor (Jukes and Cantor 1969), K80 – 2 parameters (Kimura 1980), T92 (Tamura 1992), Tajima-Nei (Tajima and Nei 1984), Jin-Nei gamma (Jin and Nei 1990), Kimura protein (Kimura 1983), LogDet (Lockhart et al. 1994), F84 (Felsenstein 1981), WAG (Whelan and Goldman 2001), JTT (Jones et al. 1992) and LG (Le and Gascuel 2008).

Robinson and Foulds topological distance

edit

This program computes the Robinson–Foulds metric (RF) topological distance (Robinson and Foulds 1981), which is a popular measure of the trees similarity, between the first tree and all the following trees specified by the user. The trees can be supplied in the newick or distance matrix formats. An optimal algorithm described in (Makarenkov and Leclerc 2000) is carried out to compute the RF metric.

Newick to Matrix conversion

edit

Newick to Distance matrix and Distance matrix to Newick format conversion. An in-house application allows the user to convert a phylogenetic tree from the Newick format to the Distance matrix format and vice versa.

Random tree generator

edit

This application generates k random phylogenetic trees with n leaves, i.e. species or taxa, and an average branch length l using the random tree generation procedure described by Kuhner and Felsenstein (1994),[8] where the variables k, n and l are defined by the user. The branch lengths of trees follow an exponential distribution. The branch lengths are multiplied by 1+ax, where the variable x is obtained from an exponential distribution (P(x>k) = exp(-k)), and the constant a is a tuning factor accounting for the deviation intensity (as described in Guindon and Gascuel (2002),[9] the value of a was set to 0.8). The random trees generated by this procedure have depth of O(log (n)).

References

edit
  1. ^ a b Boc A, Diallo Alpha B, Makarenkov V (June 2012). "T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks". Nucleic Acids Res. 40 (Web Server issue): W573–W579. doi:10.1093/nar/gks485. PMC 3394261. PMID 22675075.
  2. ^ a b Makarenkov V (July 2001). "T-REX: Reconstructing and visualizing phylogenetic trees and reticulation networks". Bioinformatics. 17 (7): 664–668. doi:10.1093/bioinformatics/17.7.664. PMID 11448889.
  3. ^ Guindon S, Delsuc F, Dufayard JF, Gascuel O (2009). Estimating maximum likelihood phylogenies with PhyML. Methods in Molecular Biology. Vol. 537. Humana Press. pp. 113–137. CiteSeerX 10.1.1.464.7907. doi:10.1007/978-1-59745-251-9_6. ISBN 978-1-58829-910-9. PMID 19378142. S2CID 8438167.
  4. ^ Stamatakis A. (August 2006). "RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models". Bioinformatics. 22 (21): 2688–2690. doi:10.1093/bioinformatics/btl446. PMID 16928733.
  5. ^ Boc A, Philippe H, Makarenkov V (January 2010). "Inferring and validating horizontal gene transfer events using bipartition dissimilarity". Syst. Biol. 59 (2): 195–211. doi:10.1093/sysbio/syp103. PMID 20525630.
  6. ^ a b Legendre P, Makarenkov V (April 2002). "Reconstruction of biogeographic and evolutionary networks using reticulograms". Syst. Biol. 51 (2): 199–216. doi:10.1080/10635150252899725. PMID 12028728.
  7. ^ Makarenkov V, Legendre P (2004). "From a phylogenetic tree to a reticulated network". J. Comput. Biol. 11 (1): 195–212. doi:10.1089/106652704773416966. PMID 15072696.
  8. ^ Kuhner MK, Felsenstein J (May 1994). "A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates". Mol Biol Evol. 11 (3): 459–468. doi:10.1093/oxfordjournals.molbev.a040126. PMID 8015439.
  9. ^ Guindon S, Gascuel O (April 2002). "Efficient biased estimation of evolutionary distances when substitution rates vary across sites". Mol Biol Evol. 19 (4): 534–43. doi:10.1093/oxfordjournals.molbev.a004109. PMID 11919295.
edit