GENSCAN
Overview
GENSCAN is a general-purpose gene identification program which
analyzes genomic DNA sequences from a variety of organisms including
human, other vertebrates, invertebrates and plants. For each
sequence, the program determines the most likely "parse" (gene
structure) under a probabilistic model of the gene structural
and compositional properties of the genomic DNA for the
given organism. This set of exons/genes is then printed to an
output file (the text output) together with the corresponding
predicted peptide sequences. A graphical (PostScript) output
may also be created which displays the location and DNA strand
of each predicted exon. Unlike the majority of other currently
available gene prediction programs, the model treats the most
general case in which the sequence may contain no genes, one gene,
or multiple genes on either or both DNA strands and partial
genes as well as complete genes are considered. The most important
restrictions are that only protein coding genes are considered (and
not tRNA or rRNA genes, for example), and that transcription units
are assumed to be non-overlapping.
Setup
To use GENSCAN it is necessary to set your GENSCAN environment by running
a special command sequence once per login session. You may optionally
place these commands in your .cshrc (C Shell users) or .profile (Bourne
Shell users) to avoid having to manuallly run these commands on login.
|
For csh and tcsh:
source /usr/local/setup/genscan.setup.csh
|
|
For sh and bash:
. /usr/local/setup/genscan.setup.sh
|
Usage
GENSCAN is invoked with the command genscan. It requires that both the
full path to an organism parameter file and an input sequence file in FASTA
or minimal GenBank format are given. The organism files on LION-XE are in
/usr/global/genscan and include:
Arabidopsis.smat
- parameter file for Arabidopsis
HumanIso.smat
parameter file for human/vertebrates
Examples
The following is an example PBS script to run a GENSCAN job on LION-XE for
a maximum of 2 hours. The input file is a FASTA format file called input.fa
and is in the directory /home/foo/genscan. The parameter file used is
Arabidopsis.smat. Output will appear in the normal PBS output file.
#PBS -l nodes=1:ppn=1
#PBS -l walltime=2:00:00
#PBS -j oe
#PBS -q lionxe-serial
# setup the GENSCAN environment
. /usr/local/setup/genscan.setup.sh
# change the current working directory to the directory where the
# input file input.fa can be found
cd /home/foo/genscan
# run the GENSCAN command
genscan /usr/global/genscan/Arabidopsis.smat input.fa
|
Further information on PBS scripts and submitting jobs on the LION-XE cluster
can be found in the User Guides section of the HPC website.
Documentation
Information on GENSCAN can be found on LION-XE in the README file in the
directory /usr/global/genscan.
Please send questions or suggestions about this web page to beatnic@aset.psu.edu
ASET | ITS | Penn State
|