ZooKeeper Leader-Election simplified
Background
Recently (in a project) we were required to determine the master node from a pool of similar type of nodes. And if master node fails, any other node should take on the responsibility – so that the service remains available.
So, the use-case was something like – Only single node should behave as a master node and it will coordinate with all the worker nodes to process the required tasks.
Clearly, it’s a leader-election recipe. This is supported in Curator/Zookeeper API, but we found it little complex in terms of blocking the threads to claim the leadership for a longer duration. Otherwise, it was not functioning well.
Solution
So, we figured out a very simple way to determine the master node. Here are the steps below:
- All the nodes will register themselves to a specific ZK path. Let’s say it’s “/my/project/coordinators/${machine-ip-address}“
- Example: If Node1 (running on IP 127.0.0.1) joins the cluster – the ZK node path will appear like “my/project/coordinators/127.0.0.1/“. We picked IP address, but it’s up to you – whatever convention you like to follow.
- Write an Algo which determines the leader/master node
- Every coordinator node will read all the children node names (i.e. list of IPs).
- Sort the node-names and pick the very first one – Declare it as master node.
- Below is the code snippet to determine if the node is master node or not
[code]
public static boolean isMasterCoordinatorNode(String nodeId) throws Exception {
List<String> coordinatorNodes = getCuratorClient().getChildren().forPath("my/project/coordinators");
if (coordinatorNodes!=null && coordinatorNodes.size()>0)
{
TreeSet<String> set = new TreeSet<String>(coordinatorNodes);
String firstNodeId = set.first();
if(firstNodeId.equals(nodeId)){
return true;
}
}
return false;
}public synchronized static CuratorFramework getCuratorClient(){
if(_client == null){
String zookeeperStr = "127.0.0.1:2181"; // zookeeper address
RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3);
_client = CuratorFrameworkFactory.newClient(zookeeperStr, retryPolicy);
_client.start();
}
return _client;
}
[/code]
- Other coordinator nodes (which are not picked as master node) – won’t do anything.
- Ensure that the registered ZooKeeper nodes are all ephemeral nodes, so that even if the master node goes down, the immediate next (available) node will become the master node.
In our case this worked very well. 🙂
Here’s my previous blog as an introduction to Curator framework
Curator Framework for Apache ZooKeeper
Nice solution. What if the current master just loses the ZK connectivity but is otherwise functional? Will this not lead to a split brain? The master would continue to think that it’s still the master, while another node whose IP is next in the list of sorted list of IPs would become the new master , and we’d have two masters.
My concern is based on the assumption that the isMasterCoordinatorNode() is NOT called on every API request that should be handled by the master. If it’s indeed called on every request, which I find a bit heavyweight, we won’t have this issue.