About HPC Systems Software User Guides Education Partners

  / gears / hpc / software / bioinf / phylip


Bioinformatics

Compilers and Programming Tools

Computational Chemistry

File System

Finite Element Solvers

Graphics

Mathematics

Numerical Libraries

Optimization

Parallel Programming Libraries and Tools

Queuing and Scheduling Systems

Solid Modeling

Statistics

PHYLIP

Overview

PHYLIP is a free package of programs for inferring phylogenies.

Setup

To use PHYLIP, it is necessary to set the PHYLIP environment by running a special command sequence once per login session. You may optionally place these commands in your .cshrc (C Shell users) or .bash_profile (Bourne Shell users) to avoid having to manually run these commands on login.

For csh and tcsh:

source /usr/local/setup/phylip.setup.csh

For sh and bash:

. /usr/local/setup/phylip.setup.sh
Usage

PHYLIP contains the following commands:

    clique
    • finds the largest clique of mutually compatible characters, and the phylogeny which they recommend, for discrete character data with two states.
    consense
    • computes consensus trees by the majority-rule consensus tree method, which also allows one to easily find the strict consensus tree.
    contml
    • estimates phylogenies from gene frequency data by maximum likelihood under a model in which all divergence is due to genetic drift in the absence of new mutations.
    dnacomp
    • estimates phylogenies from nucleic acid sequence data using the compatibility criterion, which searches for the largest number of sites which could have all states (nucleotides) uniquely evolved on the same tree.
    dnadist
    • computes four different distances between species from nucleic acid sequences. The distances can then be used in the distance matrix programs.
    dnainvar
    • for nucleic acid sequence data on four species, computes Lake's and Cavender's phylogenetic invariants, which test alternative tree topologies. The program also tabulates the frequencies of occurrence of the different nucleotide patterns.
    dnaml
    • estimates phylogenies from nucleotide sequences by maximum likelihood. The model employed allows for unequal expected frequencies of the four nucleotides, for unequal rates of transitions and transversions, and for different (prespecified) rates of change in different categories of sites, with the program inferring which sites have which rates.
    dnamlk
    • same as DNAML but assumes a molecular clock. The use of the two programs together permits a likelihood ratio test of the molecular clock hypothesis to be made.
    dnamove
    • interactive construction of phylogenies from nucleic acid sequences, with their evaluation by parsimony and compatibility and the display of reconstructed ancestral bases. This can be used to find parsimony or compatibility estimates by hand.
    dnapars
    • estimates phylogenies by the parsimony method using nucleic acid sequences. Allows use the full IUB ambiguity codes, and estimates ancestral nucleotide states.
    dnapenny
    • finds all most parsimonious phylogenies for nucleic acid sequences by branch-and-bound search.
    dollop
    • estimates phylogenies by the Dollo or polymorphism parsimony criteria for discrete character data with two states (0 and 1). Also reconstructs ancestral states and allows weighting of characters.
    dolmove
    • interactive construction of phylogenies from discrete character data with two states (0 and 1) using the Dollo or polymorphism parsimony criteria. Evaluates parsimony and compatibility criteria for those phylogenies and displays reconstructed states throughout the tree. This can be used to find parsimony or compatibility estimates by hand.
    dolpenny
    • finds all most parsimonious phylogenies for discrete-character data with two states, for the Dollo or polymorphism parsimony criteria using the branch-and-bound method of exact search.
    factor
    • takes discrete multistate data with character state trees and produces the corresponding data set with two states (0 and 1).
    fitch
    • estimates phylogenies from distance matrix data under the "additive tree model" according to which the distances are expected to equal the sums of branch lengths between the species. This program will be useful with distances computed from DNA sequences, with DNA hybridization measurements, and with genetic distances computed from gene frequencies.
    gendist
    • computes one of three different genetic distance formulas from gene frequency data. The formulas are Nei's genetic distance, the Cavalli- Sforza chord measure, and the genetic distance of Reynolds et. al. The former is appropriate for data in which new mutations occur in an infinite isoalleles neutral mutation model, the latter two for a model without mutation and with pure genetic drift. The distances are written to a file in a format appropriate for input to the distance matrix programs.
    kitsch
    • estimates phylogenies from distance matrix data under the "ultrametric" model which is the same as the additive tree model except that an evolutionary clock is assumed. This program will be useful with distances computes from DNA sequences, with DNA hybridization measurements, and with genetic distances computed from gene frequencies.
    mix
    • estimates phylogenies by some parsimony methods for discrete character data with two states (0 and 1). Also reconstructs ancestral states and allows weighting of characters.
    move
    • interactive construction of phylogenies from discrete character data with two states (0 and 1). Evaluates parsimony and compatibility criteria for those phylogenies and displays reconstructed states throughout the tree. This can be used to find parsimony or compatibility estimates by hand.
    neighbor
    • an implementation by Mary Kuhner and John Yamato of Saitou and Nei's "Neighbor Joining Method," and of the UPGMA (Average Linkage clustering) method.
    penny
    • finds all most parsimonious phylogenies for discrete-character data with two states, for the Wagner, Camin-Sokal, and mixed parsimony criteria using the branch-and-bound method of exact search.
    protpars
    • estimates phylogenies from protein sequences (input using the standard one-letter code for amino acids) using the parsimony method, in a variant which counts only those nucleotide changes that change the amino acid, on the assumption that silent changes are more easily accomplished.
    protdist
    • computes a distance measure for protein sequences, using maximum likelihood estimates based on the Dayhoff PAM matrix, Kimura's 1983 approximation to it, or a model based on the genetic code plus a constraint on changing to a different category of amino acid. The distances can then be used in the distance matrix programs.
    restml
    • estimation of phylogenies by maximum likelihood using restriction sites data (not restriction fragments but presence/absence of individual sites).
    seqboot
    • reads in a data set, and produces multiple data sets from it by bootstrap resampling. Since most programs in the current version of the package allow processing of multiple data sets, this can be used together with the consensus tree program CONSENSE to do bootstrap (or delete-half-jackknife) analyses with most of the methods in this package. This program also allows the Archie/Faith technique of permutation of species within characters.
Examples

You can run any of these programs by issuing the program name. Any data file used should be called infile. An example PBS script follows, using the program dnainvar and requesting it to run for 10 hours. The input file is called infile and is in the directory /home/foo/phylip/.

#PBS -l nodes=1:ppn=1
#PBS -l walltime=10:00:00
#PBS -j oe

# set the phylip environment
. /usr/local/setup/phylip.setup.sh

# change to the working directory
cd /home/foo/phylip

# run dnainvar
dnainvar

Further information on PBS scripts and submitting jobs on the LION-XE and LION-XL clusters can be found in the User Guides section of the HPC website.

Documentation

Information regarding PHYLIP can be found on LION-XE and LION-XL in the directory /usr/global/phylip/doc.


Please send questions or suggestions about this web page to beatnic@aset.psu.edu

ASET | ITS | Penn State