Any group of machines dedicated to a single purpose can be called a cluster. Beowulf is a multi-computer architecture which can be used for parallel computations, server consolidation or computer room management. A Beowulf cluster is a computer system conforming to the Beowulf architecture, which consists of one master node and multiple compute nodes, Originally developed by Thomas Sterling and Donald Becker at NASA. A Beowulf cluster is a group of usually identical servers running a FOSS (Free & Open Source Software) Unix-like operating system, such as Linux. They are networked into a small TCP/IP LAN, and have libraries and programs installed which allow processing to be shared among themBeowulf systems are now deployed worldwide, chiefly in support of scientific computing. They are high-performance parallel computing clusters of inexpensive personal computer hardware. The name comes from the main character in the Old English poem Beowulf.
A Beowulf cluster is a group of what are normally identical, commercially available computers, which are running a Free and Open Source Software (FOSS), Unix-like operating system, such as BSD, GNU/Linux, or Solaris. They are networked into a small TCP/IP LAN, and have libraries and programs installed which allow processing to be shared among them.
There is no particular piece of software that defines a cluster as a Beowulf. Commonly used parallel processing libraries include Message Passing Interface (MPI) and Parallel Virtual Machine (PVM). Both of these permit the programmer to divide a task among a group of networked computers, and collect the results of processing. Examples of MPI software include Open MPI or MPICH
A beowulf cluster is capable of many things that are applicable in areas ranging from data mining to research in physics and chemistry, all the way to the movie industry. Essentially anything that can perform several semi jobs concurrently can benefit from running on a Beowulf Cluster. There are two classes of these parallel programs.
Embarrassingly Parallel Computation
A Beowulf cluster is best suited for "embarrassingly parallel" tasks. In other words, those tasks that require very little communication between nodes to complete, so that adding more nodes results in a near-linear increase in computation with little extra synchonization work. Embarrassingly parallel programs may also be known as linearly scalable. Essentially, you get a linear performance increase for each additional machine with no sign of diminishing returns.
Genetic Algorithms are one example of a application of embarrassingly parallel computation, since each added node can simply act as another "island" on which to evaluate fitness of a population. Rendering an animation sequence is another, because each node can be given an individual image, or even sections of an image to render quite easily even using standard programs like POVRAY.
Explicitly Parallel Computation
Explicitly parallel programs, ie, programs that have explicitly coded parallel sections of their code that can run independently. These differ from fully parallelizable programs in that they contain one or more interdependant serial or atomic components that must be synchronized across all instances of the parallel application. This synchonization is usually done through libraries such as MPI, PVM, or even extended versions of POSIX threads and System V shared memory.
TYPES OF BEOWULF CLUSTERS
There are several ways to design a Beowulf. Based on
· Software configuration
- Net Type I - Single External Node
In this configuration, there is basically one single entry point to the cluster, i.e., one monitor and keyboard, and one external IP address. The rest of the cluster is usually then behind a normal IP masqed setup. Users are usually then encouraged to login only to the main node.
● Net Type II – Multinode / Non-dedicated
In this configuration all nodes are equivalent from a network standpoint. They all have external IP addresses, and usually all have keyboards and monitors. Usually this configuration is chosen so that nodes can also double as desktop workstations, and thus have external IP addresses in their own right.
- Arch Type I - Local Synchronized disk
In this configuration, all nodes have local disks. In addition, these disks are kept in sync nightly by an rsync job that updates pretty much everything. In this configuration, extra scratch space and /home can be optionally NFS mounted across all clusters.
- Arch Type II - Local Non-synchronized disk
In this configuration, all nodes have local disk, but they are not kept in sync. This is most useful for disk-independant embarrassingly parallel setups that merely do number crunching, and no disk-based synchronization.
- Arch Type III - NFS root
This configuration is most useful for those who wish to save money on disks for all nodes, and wish to avoid the headaches of having to keep a few dozen to a few hundred disks in sync. This option is actually quite a reasonable choice, especially for programs that need some disk synchronization, but aren't otherwise disk-bound.
- Soft Type I - Batch System
Basically a batch system is one where user can just send it a job, and it does it. Usually only one job is run at a time, and job scheduling is left to the programmer/job runner. Sometimes a queue or at least a launching script is provided through some simple bash scripts, otherwise remote processes are launched manually one at a time on each node. Needless to say, this is the easiest software type to set up, and also the least overhead.
- Soft Type II - Preemptive Scheduler/ Migrator
This class contains systems that automatically schedule and migrate processes based on cluster status. This kind of setup really is geared more towards those who just want to set up a cluster for a general mass-login system, rather than those who want to do distributed programming. The two software packages available that provide this ability are Condor and Mosix. Condor isn't officially Open Source yet, and Mosix seems far more full featured for this purpose. In fact, Mosix allows user to even build a NetType II/ArchType II style cluster and still use each node for cluster jobs. In
addition, Condor places a lot of limitations on the types of jobs that can be run across the entire cluster, where as Mosix is meant to be entirely transparent.
- Soft Type III - Fine Grained Control
In fine grained control , where individual programs themselves control the synchronization, load balancing, etc. Often times these jobs are launched through a SoftType I method, and then syncronized using one of the standardized source-level libraries already available. These include the industry standard MPI and PVM.
The advantages of Beowulf clusters are:
· Low initial costs
High performance of a multiprocessor computer at moderate cost.Much hardware devices are not required.
· Increased stability and flexibility
Nodes can be added or removed without major system reconfiguration.
· RAS Technologies – Reliable, Affordable & Scalable; Easy to deploy, use & maintain
· Distributing the work
Software running on a Beowulf cluster distributes work by dividing a problem into subproblems to be executed in parallel on different nodes of the cluster. However,the software must be explicitly designed to distribute work and collect the results.
The programmer who starts with a non distributed program faces a significant task to convert it into a distributed program to run on a Beowulf cluster, including designing a load balancing algorithm and modifying the source code to distribute the work.
The popular Beowulf clusters are only one implementation of compute clusters. Beowulf Clusters, have become the “fastest growing alternative choice” for building high-performance parallel computing systems using: COTS hardware components; Open & High Speed Interconnects like Gigabit / Infiniband; Open Source OS, Compilers & Libraries; Open Source parallel layers like PVM, MPI.Despite its popularity, it is a very limited solution that does not address all computing needs. Compute clusters built with Sun hardware and software pick up where the Beowulf solution falls short. They subsume the Beowulf capabilities and extend them to meet the demands of your environment.
The most important consideration, however, is for organizations to use all of the computing resources currently available before they invest in new equipment. This goal can be achieved quite easily with available tools at no cost, and can greatly increase the productivity of the engineering community within an organization.