cluster computing


                                 Computing is an evolutionary process. Five generations of development history— with     each generation improving on the   previous  one’s technology, architecture, software, applications, and representative systems—make that clear. As part of this evolution, computing requirements driven by applications have always outpaced the available technology. So, system designers have always needed to seek faster, more cost effective computer systems. Parallel and distributed computing provides the best solution, by offering computing power that greatly exceeds the technological limitations of single processor systems. Unfortunately, although the parallel and distributed computing concept has been with us for over three decades, the high cost of multiprocessor systems has blocked commercial success so far. Today, a wide range of applications are hungry for higher computing power, and even though single processor PCs and workstations now can provide extremely fast processing; the even faster execution that multiple processorscan achieve by working concurrently is still needed. Now, finally, costs are falling as well. Networked clusters of commodity PCs and workstations using off-the-shelf processors and communication platforms  such as Myrinet, Fast Ethernet, and Gigabit Ethernet are becoming   increasingly  cost effective and popular.  This concept, known as cluster computing, will surely continue to flourish: clusters can provide enormous computing power that a pool of users can share or that can be collectively used to solve a single application. In addition, clusters do not incur a very high cost, a factor that led to the sad demise of massively parallel machines.

                                     Clusters, built using commodity-off-the-shelf (COTS) hardware components and free, or          commonly used, software, are playing a major role in solving large-scale science, engineering, and commercial applications. Cluster computing has emerged as a result of the convergence   of   several trends, including the availability of  inexpensive high performance microprocessors and high speed networks, the  development of standard software tools for high performance distributed computing, and the increasing need of computing power for computational  science and commercial applications.

What is Clustering?

Clustering is the use of multiple computers, typically PCs or UNIX workstations, multiple storage devices, and redundant interconnections, to form what appears to users as a single highly available system. Cluster computing can be used for load balancing as well as for high availability. It is used as a relatively low-cost form of parallel processing machine for scientific and other applications that lend themselves to parallel operations.

Computer cluster technology puts clusters of systems together to provide better system reliability and performance. Cluster server systems connect a group of servers together in order to jointly provide processing service for the clients in the network.

Cluster operating systems divide the tasks amongst the available servers. Clusters of systems or workstations, on the other hand, connect a group of systems together to jointly share a critically demanding computational task. Theoretically, a cluster operating system should provide seamless optimization in every case.

At the present time, cluster server and workstation systems are mostly used in High Availability applications and in scientific applications such as numerical computations.

Advantages of clustering

  • High performance
  • Large capacity
  • High availability
  • Incremental growth

Applications of Clustering

  • Scientific computing
  • Making movies
  • Commercial servers (web/database/etc)


                                  The first commodity clustering product was ARC net, developed by Data point in 1977. ARC net wasn’t a commercial success and clustering didn’t really take off until DEC released their VAX cluster   product in the 1980s for the VAX/VMS operating system. The ARC net and VAX cluster products not only supported parallel computing, but also shared file systems and peripheral devices. They were supposed to give you the advantage of parallel processing while maintaining data reliability and uniqueness. VAX cluster, now VMS cluster, is still available on OpenVMS systems from HP running on Alpha and Itanium systems. The history of cluster computing is intimately tied up with the evolution of networking technology. As networking technology has become cheaper and faster, cluster computers have become significantly more attractive.


 How to run applications faster?

 There are 3 ways to improve performance:

 ✓  Work Harder

✓  Work Smarter

✓   Get Help

 Era of Computing

✓  Rapid technical advances

✓  The recent advances in VLSI technology

✓  Software technology grand challenge applications have become the main driving force

✓  Parallel computing


1.  Multiple High Performance Computers

a. PCs                                                   b. Workstations                                                   c. SMPs (CLUMPS)

2.  State of the art  Operating Systems

a. Linux (Beowulf)                          b.  Microsoft NT (Illinois HPVM)                 c. SUN Solaris (Berkeley NOW)

d. HP UX (Illinois – PANDA)       e. OS gluing layers (Berkeley Glunix)

 3.  High Performance Networks/Switches

a. Ethernet (10Mbps)                     b. Fast Ethernet (100Mbps)                          c. Gigabit Ethernet (1Gbps)

d. Myrinet     (1.2Gbps)                  e. Digital Memory Channel                             f. FDDI

4.  Network Interface Card

 a. Myrinet has NIC                          b. User-level access support

5.  Fast Communication Protocols and Services

a. Active Messages (Berkeley)     b. Fast Messages (Illinois)                             c. U-net (Cornell)

d. XTP (Virginia)

6.  Cluster Middleware

a. Single System Image (SSI)         b. System Availability (SA) Infrastructure

 7.  Hardware

a. DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques

8.  Operating System Kernel/Gluing Layers

a. Solaris MC, Unixware,   GLUnix

9.  Applications and Subsystems

a. Applications (system management and electronic forms)                            b. Runtime systems (software DSM, PFS etc.)

c. Resource management and scheduling software (RMS)

10. Parallel Programming Environments and Tools

a.Threads (PCs, SMPs, NOW..)                              b. MPI                                            c. PVM

d. Software DSMs                                                       e. Compilers                                f. RAD (rapid application development tools)

g. Debuggers                                                                h. Performance   Analysis Tools

i. Visualization Tools

11. Applications

a. Sequential                                                                b. Parallel / Distributed (Cluster-aware app.)


                                      Today, open standards-based HPC systems are being used to solve problems from High-end,   floating-point intensive scientific and engineering problems to data intensive tasks in industry. Some of the reasons why HPC clusters outperform RISC based systems Include:

 1. Collaboration

                                      Scientists can collaborate in real-time across dispersed locations- bridging isolated islands       of scientific research and discovery- when HPC clusters are based on open source and building block technology.

 2. Scalability

                                      HPC clusters can grow in overall capacity because processors and nodes can be added as demand increases.

 3. Availability

                                      Because single points of failure can be eliminated, if any one system component goes Down, the system as a whole or the solution (multiple systems) stay highly available.

 4. Ease of technology refresh

                                      Processors, memory, disk or operating system (OS) technology be easily updated.New processors and nodes can be added or upgraded as needed.

 5. Affordable service and support

                                      Compared to proprietary systems, the total cost of ownership can be much lower. This includes service, support and training.

 6. Vendor lock-in

                                     The age-old problem of proprietary vs. open systems that use industry-accepted standards is eliminated.

 7. System manageability

                                    The installation, configuration and monitoring of key elements of  proprietary  systems is usually accomplished with proprietary technologies, complicating system management.The servers of an HPC cluster can be easily managed from a single point using readily available network infrastructure and enterprise management software.

 8.Reusability of components

                                     Commercial components can be reused, preserving the investment. For example, older nodes can be deployed as file/print servers, web servers or other infrastructure servers.

 9.Disaster recovery

                                     Large SMPs are monolithic entities located in one facility. HPC systems can be collocated or geographically dispersed to make them less susceptible to disaster.


Clusters are classified in to several sections based on the facts such as :

♣  Application target.

♣  Node ownership.

♣  Node Hardware.

♣  Node operating System.

♣  Node configuration.

Clusters based on Application Target are again classified into two:

♣  High Performance (HP) Clusters

♣  High Availability (HA) Clusters

 Clusters based on Node Ownership are again into two:

♣  Dedicated clusters

♣  Non-dedicated clusters

Clusters based on Node Hardware are again classified into three:

♣  Clusters of PCs (CoPs)

♣  Clusters of Workstations (COWs)

♣  Clusters of SMPs (CLUMPs)

 Clusters based on Node Operating System are again classified into:

♣  Linux Clusters (e.g., Beowulf)

♣  Solaris Clusters (e.g., Berkeley NOW)

♣  Digital VMS Clusters

♣  HP-UX clusters

♣  Microsoft Wolf pack clusters

 Clusters based on Node Configuration are again classified into:

♣  Homogeneous Clusters -All nodes will have similar architectures and run the same OS

♣  Heterogeneous Clusters- All nodes will have different architectures and run different OS


                                  A cluster is a type of parallel or distributed processing system, which consists of  a collection of interconnected stand-alone computers  cooperatively  working  together  as  a  single, integrated computing  resource A  node:

 ♠  A   single   or multiprocessor system   with memory,   I/O facilities, & OS.

♠  Generally 2 or more computers (nodes) connected together.

♠  In a single cabinet, or   physically separated    & connected via   a   LAN.

  Appear as a single system to users and applications

♠ Provide a cost-effective way to gain features   and   benefits.


Three principle features usually provided by cluster computing are:

1. Availability    2. Scalability   3. Simplification.

                                      Availability is provided by the cluster of computers operating as a single system by continuing to provide services even when one of the individual computers is lost due to a hardware failure or other reason.

                                      Scalability is provided by the inherent ability of the overall system to allow new components, such as computers, to be assed as the overall system’s load is increased.

                                      The Simplification comes from the ability of the cluster to allow administrators to manage the entire group as a single system. This greatly simplifies the management of groups of systems and their applications. The goal of cluster computing is to facilitate sharing a computer load over several systems without either the users of system or the administrators needing to know that more than one system is involved.




                                        As computer networks become cheaper and faster, a new computing paradigm, called the Grid has evolved. The Grid is a large system of computing resources that performs tasks and provides to users a single point of access, commonly based on the World Wide Web interface, to these   distributed resources. Users consider the Grid as a single computational resource. Resource management software, frequently referenced as   middleware, accepts jobs submitted by users and schedules them for execution on appropriate systems in the Grid, based upon resource   management policies.

                                        Users can submit thousands of jobs at a time without being concerned about where they run. The Grid may scale from single systems to supercomputer-class compute farms that utilize thousands of   processors. Depending on the type of applications, the interconnection between the Grid parts can be performed using dedicated high speed networks or the Internet. By providing scalable, secure, high-performance mechanisms for discovering and negotiating access to remote resources, the Grid promises to make it possible for scientific collaborations to share resources on an unprecedented scale, and for geographically distributed   groups to work together in ways that were previously impossible.

                                        Some examples of new applications that benefit from using   Grid technology constitute a coupling of advanced scientific instrumentation or desktop computers with remote super- computers; collaborative design of complex systems via high-bandwidth access to shared resources; ultra-large virtual supercomputers constructed to solve problems too large to fit on any single computer; rapid, large-scale parametric studies. The Grid technology is currently under intensive development. Major Grid projects include NASA’s Information Power Grid, two NSF Grid projects (NCSA Alliance’s Virtual Machine Room and NPACI), the European Data Grid Project and   the   ASCI  Distributed  Resource Management project. Also first Grid tools are already available for developers. The   Globus Toolkit [20] represents one such example and includes a set of services and software libraries to support Grids and Grid applications.



                                          A cluster is a network of queue managers that are logically associated in some way. The queue managers in a cluster might be physically remote. For example, they might represent the branches of an international chain store and be physically located in different countries. Each cluster within an enterprise must have a unique name.

Typically a cluster contains queue managers that are logically related in some way and need to share some data or applications. For example you might have one queue manager for each department in your company, managing data and applications specific to that department. You could group all these queue managers into a cluster so that they all feed into the payroll application. Or you might have one queue manager for each branch of your chain store, managing the stock levels and other information for that branch. If you group these queue managers into a cluster, they can all access the same set of sales and purchases applications. The sales and purchases application might be held centrally on the head-office queue manager.



A Beowulf cluster uses multi computer architecture, as depicted in figure. It features a parallel computing system that usually consists of one or more master nodes and one or more compute nodes, or cluster nodes, interconnected via widely available network interconnects. All of the nodes in a typical Beowulf cluster are commodity systems- PCs, workstations, or servers-running commodity software such as Linux

The master node acts as a server for Network File System (NFS) and as a gateway to the outside world. As an NFS server, the master node provides user file space and other common system software to the compute nodes via NFS. As a gateway, the master node allows users to gain access through it to the compute nodes. Usually, the master node is the only machine that is also connected to the outside world using a second network interface card (NIC). The sole task of the compute nodes is to execute parallel jobs. In most cases, therefore, the compute nodes do not have keyboards, mice, video cards, or monitors


 Cluster Networking

                                           If you are mixing hardware that has different networking technologies, there will be large differences in the speed with which data will be accessed and how individual nodes can communicate. If it is in your budget make sure that all of the machines you want to include in your cluster have similar networking capabilities, and if at all possible, have  network adapters from the same manufacturer.

 Cluster Software

                                           We  have to build versions of clustering software for each kind of system to include in the cluster.


                                           Our code will have to be written to support the lowest common denominator for data types supported by the least powerful node in our cluster. With mixed machines, the more powerful machines will have attributes that cannot be attained in the powerful machine.


                                           This is the most problematic aspect of   heterogeneous cluster. Since these machines have different performance profile our code   will execute at different rates on the different kinds of nodes. This can cause serious bottlenecks if a process on one node is waiting for results of a calculation ona slower node. The second kind of  heterogeneous clusters is made from different machines in the same architectural family: e.g. a collection of Intel boxes where the machines are different generations or machines of same  generation from different manufacturers.

 Network Selection

                                           There are a number of different kinds of network   topologies, including buses, cubes of various degrees, and grids/meshes. These network   topologies will be implemented by use of one or more network interface cards, or NICs, installed into the head-node and compute nodes of our cluster.

 Speed Selection

                                           No matter what topology you choose for your cluster, you will want to get fastest network that your budget allows. Fortunately, the availability of high speed computers has also forced the development of high speed networking systems. Examples are 10Mbit Ethernet, 100Mbit Ethernet, gigabit networking, channel bonding etc.


                                            Clusters are being used to solve many scientific, engineering, and commercial applications. We have discussed a sample of these application areas and how they benefit from the use of clusters. The applications studied include, a Web server, an audio processing system (voice based email), data mining, network simulations, and image processing. Many large international Web portals and e-commerce sites use clusters to process customer requests  quickly and maintain high availability for 24 hours a day and throughout the year. The  capability of clusters to deliver high performance and availability on a single platform is empowering many existing and emerging applications and making clusters the platform of choice!