Gaussian 03
Overview
Gaussian 03 is designed to model a broad range of
molecular systems under a variety of conditions, performing its computations
starting from the basic laws of quantum mechanics. Theoretical chemists can use
Gaussian 03 to perform basic research in established and emerging areas
of chemical interest. Experimental chemists can use it to study molecules and
reactions of definite or potential interest, including both stable species and
those compounds which are difficult or impossible to observe experimentally
(short-lived intermediates, transition structures and so on).
Gaussian 03 can also predict energies, molecular structures, vibrational
frequencies and numerous molecular properties for systems in the gas phase and
in solution, and it can model both their ground state and excited states.
Chemists can apply these fundamental results to their own investigations, using
Gaussian 03 to explore chemical phenomena like substituent effects,
reaction mechanisms and electronic transitions.
Release Notes
Gaussian 03 Release note are available from the
Gaussian web site
by visiting URL
http://www.gaussian.com/g_tech/g03_rel.htm. The release notes contain
very important information on the new features available in our
Gaussian 03 version, functional differences from earlier
releases you may have used, and important information on bug fixes. It is
highly recommended you review these notes.
Citing Gaussian
Our license agreement with Gaussian, Inc. requires that all scholarly works created using
the Gaussian 03 package cite the use of Gaussian. The required and
proper citation information may be found on the Gaussian website at URL
http://www.gaussian.com/citation.htm
Setup
Gaussian 03 makes extensive use of temporary disk files.
By default these scratch files are written to local /tmp on each node. While
this may be sufficient for most uses the size of local /tmp may not
be sufficient for some users.
The environment variable GAUSS_SCRDIR can be used
to override this default setting.
The proper setting of GAUSS_SCRDIR is critical to good
performance. Use of the default setting, local /tmp,
is strongly recommended when using Gaussian.
Use of /scratch is recommended only if
Gaussian 03 temporary files exceeds several GB of storage.
Please see each system's individual system information web page
for information on the size of local /tmp on each machine.
An example to change the default setting for scratch files:
export GAUSS_SCRDIR=/scratch/xyz123
|
All Gaussian scratch files will now be written in directory
/scratch/xyz123
When using GAUSS_SCRDIR you must make sure the
GAUSS_SCRDIR directory is available to all nodes that
will be running Gaussian jobs.
Additional scratch space for Gaussian jobs may be available. Please
contact the system administrator for addtional information.
Usage
To use Gaussian 03 you must first be
added to the g03 access group. This is a
condition imposed by our license agreement. Users who request
accounts specifically to use Gaussian 03 normally are
added to this group by default. If you wish to use Gaussian 03
you may request that you be added to the g03 access group
by contacting the HPC group via e-mail at
beatnic@cac.psu.edu.
There is no charge to use Gaussian 03 on HPC clusters.
The master Gaussian 03 startup script,
g03 is located in directory
/usr/global/bin. This directory normally is in your
search path by default.
To start up Gaussian 03 you simply need to
type in the command g03 although it is recommended that you
execute the startup script by supplying the full path information
/usr/global/bin/g03.
Examples
To start Gaussian 03 from the command line on an
interactive-use system such as hammer.aset.psu.edu
using an input file named
foo.com you would use a command such as the following:
/usr/global/bin/g03 foo.com (recommended)
or
g03 foo.com
|
To start Gaussian 03 with an input deck foo.com on
the batch clusters such as LION-XC or LION-XO
using the PBS queueing system, a
PBS script such as the following would be used. The following
example will use a PBS assigned local temporary directory for the scratch
files.
#PBS -l nodes=1:ppn=1
#PBS -l walltime=0:10:00
#PBS -j oe
# change the current working directory to the directory where
# the input deck foo.com can be found
cd $PBS_O_WORKDIR
echo " "
echo "Starting job on `hostname` at `date`"
echo " "
# start g03 with input deck foo.com
/usr/global/bin/g03 foo.com
echo " "
echo "Completing job on `hostname` at `date`"
echo " "
|
A setting for GAUSS_SCRDIR could optionally be specified
in the above script by plaing it before the instruction to execute Gaussian.
Additional information on PBS scripts and submitting jobs
to PBS can be found in the appropriate system's User
Guide in the User Guides section of this website.
Documentation
Documention on Gaussian 03 can be accessed via the
command ghelp
From the command line simply issue the command:
/usr/global/bin/ghelp
or
ghelp
|
Additionally an on-line refernce manual for Gaussian 03
is available from the Gaussian web site at URL
http://www.gaussian.com/g_ur/g03mantop.htm
Gaussian 03 Release note are available from the Gaussian web site
by visiting URL
http://www.gaussian.com/g_tech/g03_rel.htm.
Parallel Gaussian
There are several ways to run Gaussian 03 in
parallel. One is via shared memory and is the
easiest to implement, and the other parallel method uses
distributed memory using Network Linda.i Additionally
you may run Gaussian 03 using a combination of both shared
and distributed memory techniques.
Parallel Gaussian - Shared Memory
To run Gaussian 03 using shared memory
add the keyword %NProcShared=N
to the Link 0 section of your input file, where N is the number of shared
CPUs within a node to use.
Gaussian 03 is then started in the normal way. Note that for
PBS jobs you must adjust the ppn setting to reflect
the number of CPUS requested. You cannot request more CPUS for shared
memory than are available on a system. For a more information
on how many individual processors there are per node for each system please
refer to each cluster's individual system information page for details.
Example:
|
To run a job on 2 CPUs use the Link 0 settings might look like:
%NProcShared=2
%mem=256mb
%chk=mychk.chk
etc. etc.
If you are submitting this job via PBS
you must now modify the ppn setting in the
the PBS job script to be the same value as %NProcsShared
#PBS -l nodes=1:ppn=2
#PBS -l walltime=0:10:00
#PBS -j oe
etc. etc.
|
Parallel Gaussian - Distributed Memory using Network Linda
HF, CIS=Direct, and DFT calculations on molecules are Linda parallel, i
including energies, optimizations and frequencies. TDDFT energies and MP2
energies and gradients are also Linda parallel. PBC calculations are not i
Linda parallel.
The default for molecules larger than 65 atoms is to use the linear
scaling algorithms (FMM), which is not Linda parallel. This value should
be increased to about 300 for jobs using Linda (e.g., on a cluster) to
obtain the full benefit of parallelization. This is accomplished via the
Int=FMMNAtoms= n keyword, where n
is the number of atoms.
Several things need to be done to use Linda. A different g03 calling
sequence is required, your .bashrc file may require minor modification,
you must specify the number of Linda processors, and the PBS script for
batch submission becomes much more complex than you may be currently used to.
To use the Parallel Linda version on the 64-bit clusters (e.g.
lion-xb, lion-xc, and lion-xo) you would include the following
in your .bashrc file:
if [ -f /usr/global/setup/g03.sh ]; then
. /usr/global/setup/g03.sh
fi
|
If you are unsure about adding these modifications to your .bashrc file,
or would like assistance, please contact
beatnic@cac.psu.edu for
current Network Linda / Gaussian modification instructions.
To invoke the application,
rather than starting Gaussian 03 with the command
g03, Gaussian/Linda is now started with the command
g03l (as in g03 letter el, not g03 number 1).
The number of Linda processors (remote nodes) requested for
your application is specified via the keyword
%NProcLinda=N
in the Link 0 section of your input file, where N is the number of
remote Linda nodes you wish to use.
|
To run a job on 4 remote nodes your
Link 0 settings might look like:
%NProcLinda=4
%mem=256mb
%chk=mychk.chk
etc. etc.
|
In the above example the keyword %NProcLinda is
required. The nodes setting in your PBS script must
be adjusted to match the setting used for %NProcLinda.
Use of Network Linda with shared memory parallel is highly
recommended. Both keywords %NProcLinda
and %NProcShared can be combined. The following
example PBS scripts highlights how this can be performed. This example
will use 2 processors each on 4 seperate nodes. In Link 0,
%NProcLinda would be set to 4 and
%NProcShared would be set to 2. Please consult the system information page for each individual cluster for the maximum settings.
|
The PBS script would look like the following. Just about
everything you see here is significant and should not be left
out of the script.
#PBS -l nodes=4:ppn=2
#PBS -l walltime=00:30:00
#PBS -j oe
cd $PBS_O_WORKDIR
echo " "
echo "Job started on `hostname` at `date`"
#Build host file for parallel execution
`cat $PBS_NODEFILE | sort -r | uniq > $TMPDIR/pbs_hostfile.$$`
# Set GAUSS_LFLAGS to reflect host file information
GAUSS_LFLAGS="-vv -nodefile $TMPDIR/pbs_hostfile.$$"
export GAUSS_LFLAGS
# Some general housekeeping
export GAUSS_SCRDIR=$TMPDIR
echo "Machines to be used are:"
cat "$TMPDIR/pbs_hostfile.$$"
# go go go
time g03l my_g03_input_file.com
# remove host file
rm $TMPDIR/pbs_hostfile.$$
echo " "
echo "Job Ended on `hostname` at `date`"
echo " "
|
To use only Network Linda with shared memory parallel Gaussian, remove the
%NProcShared keyword from the Link 0 section and set
ppn=1 in the PBS script. Do not make any other changes
to the above PBS script.
Linda Pitfalls
-
If there are more nodes listed than are specified with %NProcLinda , then a process may still be started on each listed node, but only the specified number of them will actually do any work. A corollary of this is that if you forget to include %NProcLinda within the Gaussian 03 input file or -L- in the Default.Route file, then the job will not run in parallel, although there may be an idle Linda process on each of the nodes in the node list.
-
Running g03 instead of g03l when you have specified a parallel job will produce an error message.
-
Running Linda calculations for methods which are not Linda parallelized.
-
Running Linda calculations on large molecules without increasing the value of Int=FMMNAtoms.
Gaussian Performance Tips
Gaussian performance is related to disk i/o and processor performance.
For most users a wise choice for the GAUSS_SCRDIR setting will have
the most impact on application performance. Use of the default local
temporary disk space will provide the highest i/o performance.
If your analysis requires large amounts of scratch disk space, first try to
run Gaussian 03 jobs on nodes with large local /tmp space before
switching to larger network-based scratch directories.
Not all applications benefit from using multiple processors. For most users
2 or 4 processors provide good speedup and any processors beyond that do
not add any significant performance improvements. The best way to find out is
via trial and error.
For disk i/o and processor hints, feel free to contact
beatnic@cac.psu.edu
Citing Gaussian
Our license agreement with Gaussian, Inc. requires that
all scholarly works created using
the Gaussian 03 package cite the use of Gaussian. The required and
proper citation information may be found on the Gaussian website at URL
http://www.gaussian.com/citation.htm
Further Information
Further information on Gaussian 03 and other products offered by
Gaussian, Inc. may be found on the Gaussian web site at URL
http://www.gaussian.com.
Please send questions or suggestions about this web page to beatnic@aset.psu.edu
ASET | ITS | Penn State
|