The Lion-X PC Cluster from Penn State |
|
|
| By the Staff at the Center for Academic Computing at Penn State University (Issue 2 2000) |
The Lion-X PC Cluster is a cost-effective, high-performance parallel
computing system that enables Penn State faculty and other researchers
to run complex computer simulation programs. Michael Dell, chairman and
CEO of Dell Computer Corporation, nominated the Lion-X PC Cluster to
become part of the Smithsonian collection. It was recently selected to
become part of the Permanent Research Collection of the Smithsonian
Institution's National Museum of American History and the 2000
Computerworld Smithsonian Collection. This article discusses the
design, implementation, and performance of this PC cluster.
The
Numerically Intensive Computing (NIC) Group of the Center for Academic
Computing at Penn State believed that access to parallel computing
resources could be made widely available using off-the-shelf
technology. This group has designed and built the Lion-X PC Cluster,
which presents a balanced approach to providing cost-effective,
general-purpose parallel computing cycles with high performance and
high reliability.
Lion-X
also offers research groups considering the purchase of their own
cluster a unique opportunity to determine which hardware and networking
technologies best suit their needs. Several research groups at Penn
State are now using the system to test and port applications.
The
Lion-X PC Cluster design balances the overall cost with the performance
and reliability of a system expected to meet the requirements of
serving a diverse group of researchers. Its high-performance symmetric
multiprocessing (SMP) nodes and multiple high-speed data networks
provide researchers with a powerful computational grid-ready PC cluster.
Designing a PC Cluster
The
design and implementation of a cost-effective PC cluster require a
delicate balance of performance, reliability, and expense. Increases in
either performance or reliability can greatly increase system expense.
Traditionally,
PC clusters are designed and built for a specific set of applications
within a department or research group. Generally the job mix is well
known and understood, and certain types of system failures can be
tolerated. Since this PC cluster is often the only large system to be
run at the departmental level, time and space constraints are not
likely to be a significant issue.
With
these considerations in mind, it is easy to build an inexpensive PC
cluster, trading in higher availability and/or greater performance for
lower cost. For a central computing facility, however, there are still
several issues that must be considered:
- A larger, more diverse user community must be served.
- The
job mix can vary widely from course-grained to fine-grained parallel
processes. This mix requires more investment in networking technology.
- The
system must provide the highest possible performance by including
components such as fast CPUs, SCSI disks, fast peripheral component
interconnect (PCI) bus, and large amounts of memory.
- The PC cluster requires hardware that will not become obsolete soon after the cluster is assembled.
- Since
larger user groups are more demanding than smaller ones, system
availability and reliability must be very high, requiring the use of
components such as redundant power supplies.
Facilities
often have numerous other large systems, so it is necessary to minimize
both floor space and staff intervention time. For example, hardware
must be supported over its entire life cycle, both in terms of warranty
and parts, and individual nodes should be easy to access. Furthermore,
since rack-mounted equipment is simpler to maintain, it is generally
recommended.
Other Factors to Consider
The
Lion-X PC Cluster nodes and network hardware can consume a significant
amount of electricity, so the NIC Group had to plan for sufficient
electrical power for now and the future. Lion-X has 14 dedicated 120v
20-amp circuits.
The
Lion-X compute nodes, Dell PowerEdge 4350s, are four rack units high.
This requires a large footprint for a 32-node cluster. The Myrinet
cable lengths were limited to 10 feet. Given the footprint required for
32 Dell PowerEdge 4350 compute nodes, this cable length imposed a
design constraint in which the farthest node must be within 10 feet of
the Myrinet switch.
Designing High Availability in Lion-X
The
central server node of a PC cluster is configured to perform multiple
roles, such as central file server, user log-in system, queuing system
master, and compute-node boot server. As a result, any downtime on the
master node can force the entire cluster system to crash, making this
node the single weakest link. Part of the Lion-X design was to use a
server machine that offered high availability and high reliability in
all aspects of its service portfolio.
The
group chose a Dell PowerEdge 6300 server as the Lion-X central server.
This machine offers built-in, hot-swappable RAID disk arrays and
multiple redundant power supplies. The drive array is configured in
RAID-5 format with both a parity disk and a hot-swappable spinning
spare drive for two levels of redundancy.
Lion-X
can lose two of its disks and up to two of its power supplies or power
circuits and still remain in service, while waiting for repairs to be
performed or power to be reapplied to the affected AC circuits. The
hot-swappable capability allows the server to remain active while the
affected components are replaced, which minimizes downtime and staff
intervention time.
The
Dell PowerEdge 6300 also offers support consistency. Its hardware will
be supported over the projected lifetime of Lion-X and its parts
availability will remain consistent.
The group also chose Linux as the operating system for Lion-X for numerous reasons, as outlined in Figure 1. Figure 2 shows the hardware and software configuration for Lion-X.

Figure 1. The Advantages of Linux for Lion-X

Figure 2. Lion-X Configuration
Measuring Lion-X Performance
Compute-node
PCI direct memory access (DMA) performance is very important. Without
proper PCI DMA performance, the high-speed networks would be data
starved and performance would suffer. A 32-bit PCI bus at 33 MHz has a
theoretical peak bandwidth of 132 MB/sec.
Several benchmark tests included the following:
- Using
the Pallas Message Passing Interface (MPI) Benchmark Suite 1.2, the MPI
point-to-point bandwidth was measured between pairs of compute nodes on
the Lion-X cluster.
 Figure 3. Fast Ethernet versus Myrinet
|
The PingPong test follows a
classical pattern used for measuring message (data) startup and
throughput time for a single message passed between two machines. All
tests are run using a single CPU within each compute node, except for
the N=64 test, which uses two CPUs per compute node. In the N=64 test,
the loopback device was used rather than directly passing messages
within memory. See Figure 3.- The
Myrinet results are more dramatic than they appear for very small
message sizes. For small messages, latency time is as important as
bandwidth. When comparing the latency time for small messages using
Myrinet and Fast Ethernet, performance improves by at least three times
using Myrinet. Finer grained parallel codes that pass many small
messages benefit from the increased network performance. See Figure 3.
 Figure 4. The Integer Sort Benchmark
|
Two Numerical Aerospace
Simulation (NAS) Parallel Benchmarks help gauge the overall performance
of Lion-X. Integer Sort (IS) is a parallel sort over small integers,
and LU is a simulated Computational Fluid Dynamics (CFD) application
that uses symmetric successive over-relaxation (SSOR) to solve a block
lower triangular/block upper triangular system of equations resulting
from an unfactored implicit finite-difference discretization of the
Navier-Stokes equations in three dimensions. The IS benchmark gauges a
system's network bandwidth while LU gauges network latency. See Figure 4.
Lessons Learned from Lion-X
The NIC Group learned several lessons from Lion-X, as described in the following sections.
Evaluating Hardware Combinations
The
use of high-performance compute nodes and redundant high-performance
networks allowed the group to evaluate hardware combinations that
provide the highest possible performance for the widest variety of
applications. Lion-X can provide these high-performance cycles over its
projected lifetime. The group has been unable to report more activity
with the Alcatel/Packet Engines Gigabit Ethernet network because there
are no suitable drivers. Currently the Gigabit Ethernet network handles
all NFS traffic.
Preventing Downtime
The
choice of server and compute nodes with hot-swappable,
field-replaceable components such as disks, power supplies, and fans
assures virtually no downtime from component failures. To date, the
only significant Lion-X downtime occurs when nonredundant components
fail in the central server node. Redundant power also ensures random
circuit failures will not disable the entire cluster.
Supporting a PC Cluster
For
ease of maintenance, it is important that all components in a PC
cluster are supported throughout its entire projected lifetime. These
components also must have a suitable warranty period. These steps help
contain long-term cluster operating costs.
Long-term
support also ensures a consistent parts base, which can lead to less
incompatibility among parts in the future. The choice of hardware for
Lion-X reflected this approach and remains a long-term goal of the
Lion-X project. Furthermore, rack mounting of all Lion-X hardware and
labeling of every cable have made it easier to service any component,
which has proven to be a very positive cost benefit.
User Applications and Utilization
The
Linux environment and standard tools and software libraries enable the
user community to become very productive on Lion-X. Porting time has
been minimal, with most users requiring a simple recompilation of their
existing codes. Lion-X went into production on September 1, 1999, and
utilization has rapidly peaked since that time.
Many
staff members of the Center for Academic Computing (CAC) at Penn State
contributed to this article. Please direct any questions to the CAC's
Numerically Intensive Computing Group at beatnic@cac.psu.edu.