About HPC Systems Software User Guides Education Partners

  / gears / hpc / software / compchem / g03


Bioinformatics

Compilers and Programming Tools

Computational Chemistry

File System

Finite Element Solvers

Graphics

Mathematics

Numerical Libraries

Optimization

Parallel Programming Libraries and Tools

Queuing and Scheduling Systems

Solid Modeling

Statistics

Gaussian 03

Overview

Gaussian 03 is designed to model a broad range of molecular systems under a variety of conditions, performing its computations starting from the basic laws of quantum mechanics. Theoretical chemists can use Gaussian 03 to perform basic research in established and emerging areas of chemical interest. Experimental chemists can use it to study molecules and reactions of definite or potential interest, including both stable species and those compounds which are difficult or impossible to observe experimentally (short-lived intermediates, transition structures and so on).

Gaussian 03 can also predict energies, molecular structures, vibrational frequencies and numerous molecular properties for systems in the gas phase and in solution, and it can model both their ground state and excited states. Chemists can apply these fundamental results to their own investigations, using Gaussian 03 to explore chemical phenomena like substituent effects, reaction mechanisms and electronic transitions.

Release Notes

Gaussian 03 Release note are available from the Gaussian web site by visiting URL http://www.gaussian.com/g_tech/g03_rel.htm. The release notes contain very important information on the new features available in our Gaussian 03 version, functional differences from earlier releases you may have used, and important information on bug fixes. It is highly recommended you review these notes.

Citing Gaussian

Our license agreement with Gaussian, Inc. requires that all scholarly works created using the Gaussian 03 package cite the use of Gaussian. The required and proper citation information may be found on the Gaussian website at URL http://www.gaussian.com/citation.htm

Setup

Gaussian 03 makes extensive use of temporary disk files. By default these scratch files are written to local /tmp on each node. While this may be sufficient for most uses the size of local /tmp may not be sufficient for some users. The environment variable GAUSS_SCRDIR can be used to override this default setting. The proper setting of GAUSS_SCRDIR is critical to good performance. Use of the default setting, local /tmp, is strongly recommended when using Gaussian. Use of /scratch is recommended only if Gaussian 03 temporary files exceeds several GB of storage.

Please see each system's individual system information web page for information on the size of local /tmp on each machine.

An example to change the default setting for scratch files:

export GAUSS_SCRDIR=/scratch/xyz123

All Gaussian scratch files will now be written in directory

/scratch/xyz123

When using GAUSS_SCRDIR you must make sure the GAUSS_SCRDIR directory is available to all nodes that will be running Gaussian jobs.

Additional scratch space for Gaussian jobs may be available. Please contact the system administrator for addtional information.

Usage

To use Gaussian 03 you must first be added to the g03 access group. This is a condition imposed by our license agreement. Users who request accounts specifically to use Gaussian 03 normally are added to this group by default. If you wish to use Gaussian 03 you may request that you be added to the g03 access group by contacting the HPC group via e-mail at beatnic@cac.psu.edu. There is no charge to use Gaussian 03 on HPC clusters.

The master Gaussian 03 startup script, g03 is located in directory /usr/global/bin. This directory normally is in your search path by default.

To start up Gaussian 03 you simply need to type in the command g03 although it is recommended that you execute the startup script by supplying the full path information /usr/global/bin/g03.

Examples

To start Gaussian 03 from the command line on an interactive-use system such as hammer.aset.psu.edu using an input file named foo.com you would use a command such as the following:

/usr/global/bin/g03 foo.com (recommended)

or

g03 foo.com

To start Gaussian 03 with an input deck foo.com on the batch clusters such as LION-XC or LION-XO using the PBS queueing system, a PBS script such as the following would be used. The following example will use a PBS assigned local temporary directory for the scratch files.

#PBS -l nodes=1:ppn=1
#PBS -l walltime=0:10:00
#PBS -j oe

# change the current working directory to the directory where
# the input deck foo.com can be found
cd $PBS_O_WORKDIR

echo " "
echo "Starting job on `hostname` at `date`"
echo " "

# start g03 with input deck foo.com
/usr/global/bin/g03 foo.com

echo " "
echo "Completing job on `hostname` at `date`"
echo " "

A setting for GAUSS_SCRDIR could optionally be specified in the above script by plaing it before the instruction to execute Gaussian.

Additional information on PBS scripts and submitting jobs to PBS can be found in the appropriate system's User Guide in the User Guides section of this website.

Documentation

Documention on Gaussian 03 can be accessed via the command

ghelp

From the command line simply issue the command:

/usr/global/bin/ghelp

or

ghelp

Additionally an on-line refernce manual for Gaussian 03 is available from the Gaussian web site at URL http://www.gaussian.com/g_ur/g03mantop.htm

Gaussian 03 Release note are available from the Gaussian web site by visiting URL http://www.gaussian.com/g_tech/g03_rel.htm.

Parallel Gaussian

There are several ways to run Gaussian 03 in parallel. One is via shared memory and is the easiest to implement, and the other parallel method uses distributed memory using Network Linda.i Additionally you may run Gaussian 03 using a combination of both shared and distributed memory techniques.

Parallel Gaussian - Shared Memory

To run Gaussian 03 using shared memory add the keyword

%NProcShared=N

to the Link 0 section of your input file, where N is the number of shared CPUs within a node to use. Gaussian 03 is then started in the normal way. Note that for PBS jobs you must adjust the ppn setting to reflect the number of CPUS requested. You cannot request more CPUS for shared memory than are available on a system. For a more information on how many individual processors there are per node for each system please refer to each cluster's individual system information page for details.

Example:

To run a job on 2 CPUs use the Link 0 settings might look like:

%NProcShared=2
%mem=256mb
%chk=mychk.chk
etc. etc.

If you are submitting this job via PBS you must now modify the ppn setting in the the PBS job script to be the same value as %NProcsShared

#PBS -l nodes=1:ppn=2
#PBS -l walltime=0:10:00
#PBS -j oe
 etc. etc.
Parallel Gaussian - Distributed Memory using Network Linda

HF, CIS=Direct, and DFT calculations on molecules are Linda parallel, i including energies, optimizations and frequencies. TDDFT energies and MP2 energies and gradients are also Linda parallel. PBC calculations are not i Linda parallel.

The default for molecules larger than 65 atoms is to use the linear scaling algorithms (FMM), which is not Linda parallel. This value should be increased to about 300 for jobs using Linda (e.g., on a cluster) to obtain the full benefit of parallelization. This is accomplished via the Int=FMMNAtoms= n keyword, where n is the number of atoms.

Several things need to be done to use Linda. A different g03 calling sequence is required, your .bashrc file may require minor modification, you must specify the number of Linda processors, and the PBS script for batch submission becomes much more complex than you may be currently used to.

To use the Parallel Linda version on the 64-bit clusters (e.g. lion-xb, lion-xc, and lion-xo) you would include the following in your .bashrc file:

if [ -f /usr/global/setup/g03.sh ]; then
 . /usr/global/setup/g03.sh
fi

If you are unsure about adding these modifications to your .bashrc file, or would like assistance, please contact beatnic@cac.psu.edu for current Network Linda / Gaussian modification instructions.

To invoke the application, rather than starting Gaussian 03 with the command g03, Gaussian/Linda is now started with the command g03l (as in g03 letter el, not g03 number 1).

The number of Linda processors (remote nodes) requested for your application is specified via the keyword

%NProcLinda=N

in the Link 0 section of your input file, where N is the number of remote Linda nodes you wish to use.

To run a job on 4 remote nodes your Link 0 settings might look like:

%NProcLinda=4
%mem=256mb
%chk=mychk.chk
etc. etc.

In the above example the keyword %NProcLinda is required. The nodes setting in your PBS script must be adjusted to match the setting used for %NProcLinda.

Use of Network Linda with shared memory parallel is highly recommended. Both keywords %NProcLinda and %NProcShared can be combined. The following example PBS scripts highlights how this can be performed. This example will use 2 processors each on 4 seperate nodes. In Link 0, %NProcLinda would be set to 4 and %NProcShared would be set to 2. Please consult the system information page for each individual cluster for the maximum settings.

The PBS script would look like the following. Just about everything you see here is significant and should not be left out of the script.

#PBS -l nodes=4:ppn=2
#PBS -l walltime=00:30:00
#PBS -j oe
                                                                                
cd $PBS_O_WORKDIR
                                                                                
echo " "
echo "Job started on `hostname` at `date`"
                                                                                
#Build host file for parallel execution
                                                                                
`cat $PBS_NODEFILE | sort -r | uniq > $TMPDIR/pbs_hostfile.$$`

# Set GAUSS_LFLAGS to reflect host file information

GAUSS_LFLAGS="-vv -nodefile $TMPDIR/pbs_hostfile.$$"
export GAUSS_LFLAGS

# Some general housekeeping

export GAUSS_SCRDIR=$TMPDIR
                                                                                
echo "Machines to be used are:"
cat "$TMPDIR/pbs_hostfile.$$"
                                                                                
# go go go

time g03l my_g03_input_file.com
                                                                                
# remove host file
rm $TMPDIR/pbs_hostfile.$$
                                                                                
echo " "
echo "Job Ended on `hostname` at `date`"
echo " "

To use only Network Linda with shared memory parallel Gaussian, remove the %NProcShared keyword from the Link 0 section and set ppn=1 in the PBS script. Do not make any other changes to the above PBS script.

Linda Pitfalls
  • If there are more nodes listed than are specified with %NProcLinda , then a process may still be started on each listed node, but only the specified number of them will actually do any work. A corollary of this is that if you forget to include %NProcLinda within the Gaussian 03 input file or -L- in the Default.Route file, then the job will not run in parallel, although there may be an idle Linda process on each of the nodes in the node list.

  • Running g03 instead of g03l when you have specified a parallel job will produce an error message.

  • Running Linda calculations for methods which are not Linda parallelized.

  • Running Linda calculations on large molecules without increasing the value of Int=FMMNAtoms.

Gaussian Performance Tips

Gaussian performance is related to disk i/o and processor performance.

For most users a wise choice for the GAUSS_SCRDIR setting will have the most impact on application performance. Use of the default local temporary disk space will provide the highest i/o performance. If your analysis requires large amounts of scratch disk space, first try to run Gaussian 03 jobs on nodes with large local /tmp space before switching to larger network-based scratch directories.

Not all applications benefit from using multiple processors. For most users 2 or 4 processors provide good speedup and any processors beyond that do not add any significant performance improvements. The best way to find out is via trial and error.

For disk i/o and processor hints, feel free to contact beatnic@cac.psu.edu

Citing Gaussian

Our license agreement with Gaussian, Inc. requires that all scholarly works created using the Gaussian 03 package cite the use of Gaussian. The required and proper citation information may be found on the Gaussian website at URL http://www.gaussian.com/citation.htm

Further Information

Further information on Gaussian 03 and other products offered by Gaussian, Inc. may be found on the Gaussian web site at URL http://www.gaussian.com.


Please send questions or suggestions about this web page to beatnic@aset.psu.edu

ASET | ITS | Penn State