About HPC Systems Software User Guides Education Partners

  / gears / hpc / software / bioinf / fasta


Bioinformatics

Compilers and Programming Tools

Computational Chemistry

File System

Finite Element Solvers

Graphics

Mathematics

Numerical Libraries

Optimization

Parallel Programming Libraries and Tools

Queuing and Scheduling Systems

Solid Modeling

Statistics

FASTA

Overview

FASTA compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.

Setup

To use FASTA, it is necessary to set the FASTA environment by running a special command sequence once per login session. You may optionally place these commands in your .cshrc (C Shell users) or .bash_profile (Bourne Shell users) to avoid having to manually run these commands on login.

For csh and tcsh:

source /usr/local/setup/fasta.setup.csh

For sh and bash:

. /usr/local/setup/fasta.setup.sh
Usage

The FASTA software package contains the following commands:

    fasta34
    • Compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.
    fastf34
    • Compares an ordered peptide mixture, as would be obtained by Edman degredation of a CNBr cleavage of a protein, against a protein (fastf) or DNA (tfastf) database.
    fasts34
    • Compares set of short peptide fragments, as would be obtained from mass-spec. analysis of a protein, against a protein (fasts) or DNA (tfasts) database.
    fastx34
    • Compares a DNA sequence to a protein sequence database, translating the DNA sequence in three forward (or reverse) frames and allowing frameshifts.
    fasty34
    • Same as fastx34
    ssearch34
    • Performs a rigorous Smith-Waterman alignment between a protein sequence and another protein sequence or a protein database, or with DNA sequence to another DNA sequence or a DNA library (very slow).
    prss34
    • Evaluates the significance of pairwise similarity scores using a Monte Carlo analysis. Similarity scores for the two sequences are calculated, and then the second sequence is shuffled 200 to 1000 times and compared with the first sequence.
    tfasta34
    • Same as fasta34
    tfastf34
    • Same as fastf34
    tfasts34
    • Same as fasts34
    tfastx34
    • Compares a protein sequence to a DNA sequence or DNA sequence library. The DNA sequence is translated in three forward and three reverse frames, and the protein query sequence is compared to each of the six derived protein sequences. The DNA sequence is translated from one end to the other; no attempt is made to edit out intervening sequences. Termination codons are translated into unknown ('X') amino acids.
    tfasty34
    • Same as tfastx34
Examples

The following is an example PBS script to run a FASTA job on LION-XE for a maximum of 10 hours. The test sequence file is called test.aa, and can be in one of 3 formats: plain sequence file, FASTA format files, or distributed sequence libraries. The name of the library file is testlib.lib and must be in FASTA format. Both of these files are in the directory /home/foo/fasta. This also assumes a ktup level of 1.

#PBS -l nodes=1:ppn=1
#PBS -l walltime=10:00:00
#PBS -j oe

# setup the FASTA environment
. /usr/local/setup/fasta.setup.sh

# change the current working directory to the directory
# that contains the input files.
cd /home/foo/fasta

# run FASTA
fasta34 test.aa testlib.lib ktup 1

Further information on PBS scripts and submitting jobs on the LION-XE and LION-XL clusters can be found in the User Guides section of the HPC website.

Documentation

Information on FASTA is available in the directory /usr/global/fasta. The main information file is called fasta3x.doc.


Please send questions or suggestions about this web page to beatnic@aset.psu.edu

ASET | ITS | Penn State