Wednesday, January 12, 2011

How To Build A Heartbeat Cluster

How To Build A Heartbeat Cluster

Today we will install and configure a basic high-availability cluster working as a very simple web server. I am using Ubuntu Linux and a VMWare environment for this How-to, just for the sake of simplicity. This howto is meant to give you a working ha-cluster to have a starting point for testing and further research. Please remember: what we install and configure here is not necessarily ready for production. I make some shortcuts one might not want to do in a production environment. This mainly applies to the mechanism for detecting a failed node.

Preparations
We need two identical machines whose only difference is their IP address. Then we also need a third IP address that is used for the highly available service. In our case the service will be a simple Apache web server, running on both cluster nodes.

We create two machines with Ubuntu 8.04 server 64Bit and chose "openssh server" during the installation, nothing else. After installation perform the usual apt-get update, apt-get dist-upgrade. Take care that all usernames and passwords are the same between the two cluster nodes. Give the nodes a static IP address. I gave "hacluster1" the address 192.168.35.81 and node "hacluster2" the 192.168.35.82. Of course you have to adapt the ip addresses to your infrastructure. Now we install the heartbeat software:
apt-get install heartbeat-2 heartbeat-2-gui xauth

For the floating IP address to work we need to append the following line to /etc/sysctl
net/ipv4/ip_nonlocal_bind = 1

Now we're configuring the heartbeat cluster. Edit /etc/ha.d/authkeys (the file doesn't exist yet):
auth 3
3 md5 somerandomstring

after saving it, change the file's permissions: "chmod 600 /etc/ha.d/authkeys". The file defines how the communication between cluster nodes is authenticated.

Next file to edit is "/etc/ha.d/ha.cf" (the file might not exist yet):
logfacility local0
node hacluster1 hacluster2
bcast eth0
crm on

The second line defines which machines are part of the cluster, thus "hacluster1" and "hacluster2" should be hostnames. "bcast eth0" tells the heartbeat software to communicate with the other nodes via broadcast packets on eth0.

Last thing to do is to set a password for user hacluster on both machines. Now we have a readily configured cluster of two nodes and we should log into the VMWare control center to make snapshots of each node. Thus we can go back to a vanilla cluster everytime we want.

Highly available web server.
Now that the cluster is ready for work we also need a service to be managed by the cluster. For the sake of simplicity we will install an Apache web server on both nodes: after installing the server software with "apt-get install apache2" remove the symlink for apache in /etc/rc2.d. On a normal server machine the web server is started automatically at system start-up via these symlinks. But on a cluster only the cluster software is responsible for starting and stopping the "clustered" services.

Edit "/var/www/index.html" and change "It works" for "hacluster1" on the first node and "hacluster2" on the second. Thus we can easily see in the browser from which node the web pages are being served.

Now set the password for user hacluster, else we cannot log into the gui. Just chose any password you like.

Everything you did until now had to be done on each of the nodes. But now that the cluster is prepared, the remaining configuration is done only once and will be propagated among the nodes automatically. Log into one of the nodes from you local X11 xterm and start "hb_gui", connect to 127.0.0.1, user hacluster and the password you've chosen. Remember: if you want to use a remote X11 app, you have to log in from a local xterm with "ssh -XC ". On Mac OS you would need to install X11, the Terminal won't do. Under Windows you would need something like Cygwin, a mere Putty won't do neither.

OK, we're logged into the cluster gui. While the cluster is generally working, it doesn't do anything at the moment, there's nothing yet configured. What we'll do is to configure a highly available web server cluster, where either note on or two will serve static web pages.

First we need to define a shared IP address for the web server. So right-click on "Resources", chose "Add new item", leave type as "native". Now scroll down in the list and chose "IPaddr" in the column "Name". In the "Parameters" field below, type the shared IP address into the "Value" field, hit RETURN and then click on "Add" (lower right):


The second item will be the apache resource itself: in the main gui window right-click on "Resource" again and "Add new item", type "native". Chose "apache2" from the list, no additional parameters needed. Click "Add" in the lower right.

We could now start the resources and the web service would work already, serving from either node1 or node2. But to have a well configured cluster, we need the Apache service to be "bound" to the IP address resource, so that the Apache is always running on the same node where its IP address is running. So we create a co-location: right-click on the "Colocations" entry in the "Constraints" list, chose "colocation", give something meaningful as "ID". Chose your IP resource as "From" and the Apache resouces as "To", leave the score as "INFINITY", click "OK".

We also have to make sure that the Apache is always started on a particular node only after its IP address has been activated, else the web server might not work. Thus we need an "Orders" rule: Right-click on "Orders" in the constraints list, "Add new item", leave type as "order". As "From" chose the IP address resource, leave "Type" as "before" and chose the Apache resource as "To". Click "OK".

Now everything is ready to be started: right-click on both resources and select "start".


If you now try to view the web site in your browser, it should either show "hacluster1" or "hacluster2". Let's test the fail-over process: right-click on the "cd" node, chose "standby". You should see the two resources quickly moving to the other node. If you now reload the page in your browser, it should show the other node than before. Once you switch the standby node back to "active", the resources are moving back as well.

That's all for now. You have a working cluster as a start for further testing and research. The clustered Apache in this example would be useful in a production environment only if it serves completely static content that doesn't change frequently. And you have to make sure that both the Apache configurations as well as the web files are identical on both nodes.

But as almost all web server these days are using databases and their content is updated very often (like this blog, ahemm), the clustered Apache as described her doesn't make much sense. More to it in the next instalment.


httpCOLON
SLASHSLASHblogDOTtaggesellDOTdeSLASHindexDOTphp?SLASHarchivesSLASH83-How-To-Build-A-Heartbeat-ClusterDOThtml

No comments:

Post a Comment