FASTA
Overview
FASTA compares a protein sequence to another protein
sequence or to a protein database, or a DNA sequence to another DNA
sequence or a DNA library.
Setup
To use FASTA, it is necessary to set the
FASTA environment by running a
special command sequence once per login session. You may optionally
place these commands in your .cshrc (C Shell users) or .bash_profile (Bourne Shell users) to avoid having to manually run these commands on login.
|
For csh and tcsh:
source /usr/local/setup/fasta.setup.csh
|
|
For sh and bash:
. /usr/local/setup/fasta.setup.sh
|
Usage
The FASTA software package contains the following commands:
fasta34
- Compares a protein sequence to another protein sequence or to
a protein database, or a DNA sequence to another DNA sequence
or a DNA library.
fastf34
- Compares an ordered peptide mixture, as would be obtained by
Edman degredation of a CNBr cleavage of a protein, against a
protein (fastf) or DNA (tfastf) database.
fasts34
- Compares set of short peptide fragments, as would be obtained
from mass-spec. analysis of a protein, against a protein (fasts)
or DNA (tfasts) database.
fastx34
- Compares a DNA sequence to a protein sequence database,
translating the DNA sequence in three forward (or reverse)
frames and allowing frameshifts.
ssearch34
- Performs a rigorous Smith-Waterman alignment between a
protein sequence and another protein sequence or a protein
database, or with DNA sequence to another DNA sequence or
a DNA library (very slow).
prss34
- Evaluates the significance of pairwise similarity scores using
a Monte Carlo analysis. Similarity scores for the two sequences
are calculated, and then the second sequence is shuffled 200
to 1000 times and compared with the first sequence.
tfastx34
- Compares a protein sequence to a DNA sequence or DNA sequence
library. The DNA sequence is translated in three forward
and three reverse frames, and the protein query sequence is
compared to each of the six derived protein sequences. The DNA
sequence is translated from one end to the other; no attempt
is made to edit out intervening sequences. Termination codons
are translated into unknown ('X') amino acids.
Examples
The following is an example PBS script to run a FASTA job on LION-XE
for a maximum of 10 hours. The test sequence file is called test.aa,
and can be in one of 3 formats: plain sequence file, FASTA format files,
or distributed sequence libraries. The name of the library file is
testlib.lib and must be in FASTA format. Both of these files are in the
directory /home/foo/fasta. This also assumes a ktup level of 1.
#PBS -l nodes=1:ppn=1
#PBS -l walltime=10:00:00
#PBS -j oe
# setup the FASTA environment
. /usr/local/setup/fasta.setup.sh
# change the current working directory to the directory
# that contains the input files.
cd /home/foo/fasta
# run FASTA
fasta34 test.aa testlib.lib ktup 1
|
Further information on PBS scripts and submitting jobs on the LION-XE
and LION-XL clusters can be found in the User Guides section of the
HPC website.
Documentation
Information on FASTA is available in the directory /usr/global/fasta.
The main information file is called fasta3x.doc.
Please send questions or suggestions about this web page to beatnic@aset.psu.edu
ASET | ITS | Penn State
|