PHYLIP
Overview
PHYLIP is a free package of programs for inferring phylogenies.
Setup
To use PHYLIP, it is necessary to set the PHYLIP environment by running a
special command sequence once per login session. You may optionally
place these commands in your .cshrc (C Shell users) or .bash_profile (Bourne Shell users) to avoid having to manually run these commands on login.
|
For csh and tcsh:
source /usr/local/setup/phylip.setup.csh
|
|
For sh and bash:
. /usr/local/setup/phylip.setup.sh
|
Usage
PHYLIP contains the following commands:
clique
- finds the largest clique of mutually compatible characters, and
the phylogeny which they recommend, for discrete character data with two
states.
consense
- computes consensus trees by the majority-rule consensus
tree method, which also allows one to easily find the strict consensus
tree.
contml
- estimates phylogenies from gene frequency data by maximum
likelihood under a model in which all divergence is due to genetic drift
in the absence of new mutations.
dnacomp
- estimates phylogenies from nucleic acid sequence data
using the compatibility criterion, which searches for the largest number
of sites which could have all states (nucleotides) uniquely evolved on
the same tree.
dnadist
- computes four different distances between species from
nucleic acid sequences. The distances can then be used in the distance
matrix programs.
dnainvar
- for nucleic acid sequence data on four species,
computes Lake's and Cavender's phylogenetic invariants, which test
alternative tree topologies. The program also tabulates the frequencies
of occurrence of the different nucleotide patterns.
dnaml
- estimates phylogenies from nucleotide sequences by maximum
likelihood. The model employed allows for unequal expected frequencies
of the four nucleotides, for unequal rates of transitions and
transversions, and for different (prespecified) rates of change in
different categories of sites, with the program inferring which sites
have which rates.
dnamlk
- same as DNAML but assumes a molecular clock. The use of the two
programs together permits a likelihood ratio test of the molecular clock
hypothesis to be made.
dnamove
- interactive construction of phylogenies from nucleic
acid sequences, with their evaluation by parsimony and compatibility and
the display of reconstructed ancestral bases. This can be used to find
parsimony or compatibility estimates by hand.
dnapars
- estimates phylogenies by the parsimony method using nucleic
acid sequences. Allows use the full IUB ambiguity codes, and estimates
ancestral nucleotide states.
dnapenny
- finds all most parsimonious phylogenies for nucleic
acid sequences by branch-and-bound search.
dollop
- estimates phylogenies by the Dollo or polymorphism
parsimony criteria for discrete character data with two states (0 and
1). Also reconstructs ancestral states and allows weighting of
characters.
dolmove
- interactive construction of phylogenies from discrete
character data with two states (0 and 1) using the Dollo or polymorphism
parsimony criteria. Evaluates parsimony and compatibility criteria for
those phylogenies and displays reconstructed states throughout the tree.
This can be used to find parsimony or compatibility estimates by hand.
dolpenny
- finds all most parsimonious phylogenies for
discrete-character data with two states, for the Dollo or polymorphism
parsimony criteria using the branch-and-bound method of exact search.
factor
- takes discrete multistate data with character state trees
and produces the corresponding data set with two states (0 and 1).
fitch
- estimates phylogenies from distance matrix data under the "additive
tree model" according to which the distances are expected to equal
the sums of branch lengths between the species. This program will be
useful with distances computed from DNA sequences, with DNA hybridization
measurements, and with genetic distances computed from gene frequencies.
gendist
- computes one of three different genetic distance
formulas from gene frequency data. The formulas are Nei's genetic
distance, the Cavalli- Sforza chord measure, and the genetic distance of
Reynolds et. al. The former is appropriate for data in which new
mutations occur in an infinite isoalleles neutral mutation model, the
latter two for a model without mutation and with pure genetic drift. The
distances are written to a file in a format appropriate for input to the
distance matrix programs.
kitsch
- estimates phylogenies from distance matrix data under the
"ultrametric" model which is the same as the additive tree model except
that an evolutionary clock is assumed. This program will be useful
with distances computes from DNA sequences, with DNA hybridization
measurements, and with genetic distances computed from gene frequencies.
mix
- estimates phylogenies by some parsimony methods for discrete
character data with two states (0 and 1). Also reconstructs ancestral
states and allows weighting of characters.
move
- interactive construction of phylogenies from discrete
character data with two states (0 and 1). Evaluates parsimony and
compatibility criteria for those phylogenies and displays reconstructed
states throughout the tree. This can be used to find parsimony or
compatibility estimates by hand.
neighbor
- an implementation by Mary Kuhner and John Yamato of
Saitou and Nei's "Neighbor Joining Method," and of the UPGMA (Average
Linkage clustering) method.
penny
- finds all most parsimonious phylogenies for discrete-character
data with two states, for the Wagner, Camin-Sokal, and mixed parsimony
criteria using the branch-and-bound method of exact search.
protpars
- estimates phylogenies from protein sequences (input using the
standard one-letter code for amino acids) using the parsimony method, in
a variant which counts only those nucleotide changes that change the
amino acid, on the assumption that silent changes are more easily
accomplished.
protdist
- computes a distance measure for protein sequences,
using maximum likelihood estimates based on the Dayhoff PAM matrix,
Kimura's 1983 approximation to it, or a model based on the genetic code
plus a constraint on changing to a different category of amino acid. The
distances can then be used in the distance matrix programs.
restml
- estimation of phylogenies by maximum likelihood using
restriction sites data (not restriction fragments but presence/absence
of individual sites).
seqboot
- reads in a data set, and produces multiple data sets
from it by bootstrap resampling. Since most programs in the current
version of the package allow processing of multiple data sets, this can
be used together with the consensus tree program CONSENSE to do
bootstrap (or delete-half-jackknife) analyses with most of the methods
in this package. This program also allows the Archie/Faith technique of
permutation of species within characters.
Examples
You can run any of these programs by issuing the program name.
Any data file used should be called infile. An
example PBS script follows, using the program dnainvar and requesting it
to run for 10 hours. The input file is called infile and is in the
directory /home/foo/phylip/.
#PBS -l nodes=1:ppn=1
#PBS -l walltime=10:00:00
#PBS -j oe
# set the phylip environment
. /usr/local/setup/phylip.setup.sh
# change to the working directory
cd /home/foo/phylip
# run dnainvar
dnainvar
|
Further information on PBS scripts and submitting jobs on the LION-XE
and LION-XL clusters can be found in the User Guides section of the HPC website.
Documentation
Information regarding PHYLIP can be found on LION-XE and LION-XL in the
directory /usr/global/phylip/doc.
Please send questions or suggestions about this web page to beatnic@aset.psu.edu
ASET | ITS | Penn State
|