Simple node failover using Heartbeat

The first thing you need to do is to install Heartbeat in your system. Its configuration directory is /etc/heartbeat, and there are 3 files involved in this simple configuration, but they aren't commonly placed by default, so you have to create them (note that the first two should be identical in both nodes).

  • authkeys
  • haresources
  • ha.cf
The authkeysfile

This file contains pre-shared keys that are used for mutual authentication in both nodes. This file should be readable only by the root user, chmod and chown are your friends here.

auth <index>
<index> <algorithm> <key>

<index> is just a simple index, so you can use 1 here or whatever number you want. <algorithm> should be either md5 or sha1, there is a third algorithm called crc but its use is not recommended. <key> is your desired password, Heartbeat will apply the selected algorithm on it and use it to authenticate cluster members.

The ha.cf file

This is the configuration file for each node, the only file that (in this setup) is different.

ucast <dev> <peer>
node <node1_hostname>
node <node2_hostname>
auto_failback <on|off>

With ucast, we're defining a unicast communication between the nodes, and <dev> is the interface used to reach the node IP address specified in the <peer> option. This peer is the other end node.

Both nodes should have different host names, and the specified host name should equal the hostname or the uname -n output in each node. Here, the /etc/hostname file and the hostname command are your friends. After you make sure of this, configure <node1_hostname> and <node2_hostname> as appropriate.

The last auto_failback directive is used to specify if a resource will fallback again to the original resource once it's available again. That is, if the node 1 box goes online, the floating IP will be reassigned to it. You can control this behavior with the <on|off> parameter.

Configure this file in both nodes, and remember to change the <dev> and <peer> parameters accordingly, those will change for each node.

The haresources file

This is the most important file in the configuration. This file specifies cluster services and owners.

<node> <cluster_ip> [<services>]

The previous code describes an entry for that file. The <node> option is the hostname of the resource owner, you should use here the same host name that was used in the ha.cf file. The <cluster_ip> parameter defines the floating IP address that will be configured for the cluster, don't configure any of the nodes with this IP address directly, it'll be assigned automatically on failover by Heartbeat.

If the IP address promotion is the only thing you need to do on failover, nothing else is required. If you need to check for one or more specific services, pass the list using the <services> parameter. For example:

mongodb.service.local 172.16.69.42 nginx varnish

This will check that mongodb.service.local (the owner of the services) gets the floating IP assigned, and that both nginx and varnish services are running. Services are started left-to-right, and stopped right-to-left. Note that this file should be identical in both nodes.

After configuring this third files in both nodes, heartbeat is ready to be started. If the IP resource is the only one you configured, halting the primary server will make heartbeat assign the floating IP address to the other node. If you're using more services, they will be started in the node as well.

Considerations

This setup is aimed to 2-machines clusters only. If you need to add more machines, you'll need to use a cluster manager tool such as Pacemaker.

Depending on the services you want to manage with the cluster, you could also use a tool like Keepalived, but keep in mind that it's a different tool than Heartbeat, and serves some slightly different purposes. In summary Heartbeat is a tool that will manage services and will make sure that they're running in at most one place. Keepalived is a tool for keeping a shared/floating IP present at at least one place. Read this explanation for a better understanding of the comparative, it's a message from the original author of HAProxy, also a core Linux Kernel contributor.