About HPC Systems Software User Guides Education Partners

  / gears / hpc / software / bioinf / genscan


Bioinformatics

Compilers and Programming Tools

Computational Chemistry

File System

Finite Element Solvers

Graphics

Mathematics

Numerical Libraries

Optimization

Parallel Programming Libraries and Tools

Queuing and Scheduling Systems

Solid Modeling

Statistics

GENSCAN

Overview

GENSCAN is a general-purpose gene identification program which analyzes genomic DNA sequences from a variety of organisms including human, other vertebrates, invertebrates and plants. For each sequence, the program determines the most likely "parse" (gene structure) under a probabilistic model of the gene structural and compositional properties of the genomic DNA for the given organism. This set of exons/genes is then printed to an output file (the text output) together with the corresponding predicted peptide sequences. A graphical (PostScript) output may also be created which displays the location and DNA strand of each predicted exon. Unlike the majority of other currently available gene prediction programs, the model treats the most general case in which the sequence may contain no genes, one gene, or multiple genes on either or both DNA strands and partial genes as well as complete genes are considered. The most important restrictions are that only protein coding genes are considered (and not tRNA or rRNA genes, for example), and that transcription units are assumed to be non-overlapping.

Setup

To use GENSCAN it is necessary to set your GENSCAN environment by running a special command sequence once per login session. You may optionally place these commands in your .cshrc (C Shell users) or .profile (Bourne Shell users) to avoid having to manuallly run these commands on login.

For csh and tcsh:

source /usr/local/setup/genscan.setup.csh

For sh and bash:

. /usr/local/setup/genscan.setup.sh
Usage

GENSCAN is invoked with the command genscan. It requires that both the full path to an organism parameter file and an input sequence file in FASTA or minimal GenBank format are given. The organism files on LION-XE are in /usr/global/genscan and include:

    Arabidopsis.smat
    • parameter file for Arabidopsis
    HumanIso.smat parameter file for human/vertebrates
    Maize.smat
    • parameter file for maize
Examples

The following is an example PBS script to run a GENSCAN job on LION-XE for a maximum of 2 hours. The input file is a FASTA format file called input.fa and is in the directory /home/foo/genscan. The parameter file used is Arabidopsis.smat. Output will appear in the normal PBS output file.

  #PBS -l nodes=1:ppn=1
  #PBS -l walltime=2:00:00
  #PBS -j oe
  #PBS -q lionxe-serial

  # setup the GENSCAN environment
  . /usr/local/setup/genscan.setup.sh

  # change the current working directory to the directory where the
  # input file input.fa can be found
  cd /home/foo/genscan

  # run the GENSCAN command
  genscan /usr/global/genscan/Arabidopsis.smat input.fa

Further information on PBS scripts and submitting jobs on the LION-XE cluster can be found in the User Guides section of the HPC website.

Documentation

Information on GENSCAN can be found on LION-XE in the README file in the directory /usr/global/genscan.


Please send questions or suggestions about this web page to beatnic@aset.psu.edu

ASET | ITS | Penn State