Sputnik
Overview
Sputnik is a C language program that searches dna sequence files in FASTA
format for microsatellite repeats. A sequence file is specified on the
command line and the resulting hits are written to stdout along with their
position in the sequence, length, and a score determined by the length of
the repeat and the number of errors.
Sputnik uses a recursive algorithm to search for repeated patterns of
nucleotides of length between 2 and 5. Insertions, mismatches and deletions
are tolerated but affect the overall score. It does not search against a
"library" of known microsatellites. Instead it reads through the entire
sequence, assumes the existence of a repeat at every position, compares
subsequent nucleotides and applies a simple scoring rule. If the resulting
score rises above a preset threshold, the region along with its position
and score is written out. If the score falls below a cutoff threshold, the
search is abandoned and begun again at the next nucleotide. Each nucleotide
that matches the value predicted (by assuming a repeat) adds to the score.
Each "error" subtracts from the score. When an error is encountered, the
three possible kinds of errors (mismatch, insertion and deletion) are
assumed and recursive calls to the comparison routine are made. If the
resulting score from one of these is above the cutoff threshold, it is
returned and the best of three pursued.
Setup
To use Sputnik it is necessary to set your Sputnik environment by running
a special command sequence once per login session. You may optionally
place these commands in your .cshrc (C Shell users) or .profile (Bourne
Sell users) to avoid having to manually run these commands on login.
|
For csh and tcsh:
source /usr/local/setup/sputnik.setup.csh
|
|
|
For sh and bash:
. /usr/local/setup/sputnik.setup.sh
|
|
Usage
Sputnik is invoked with the command sputnik. It takes as an arguement
the name of a file of sequences in FASTA format.
Examples
The following is an example PBS script to run a Sputnik job on LION-XE for a
maximum of 2 hours. The input file input.fa is in FASTA format and for the
scope of this example is in /home/foo/sputnik. Since Sputnik sends its
output to STDOUT (standard output), the job output will be in the normal
PBS output file.
#PBS -l nodes=1:ppn=1
#PBS -l walltime=2:00:00
#PBS -j oe
#PBS -q lionxe-serial
# setup the sputnik environment
. /usr/local/setup/sputnik.setup.sh
# change the current working directory to the directory where
# the input file can be found
cd /home/foo/sputnik
# run the sputnik command
sputnik input.fa
|
Further information on PBS scripts and submitting jobs on the LION-XE cluster
can be found in the User Guides section of the HPC website.
Documentation
Information on Sputnik can be found on LION-XE in the file
/usr/global/sputnik/README.
Please send questions or suggestions about this web page to beatnic@aset.psu.edu
ASET | ITS | Penn State
|