Tech IndustryJan 19, 2022
CiscoRfRR42

Best resources on coordination frameworks like Zookeeper ?

Hi all, I am trying to find best resources online to learn how Zookeeper does failure detection in a cluster of nodes, leader election if master goes down, heart beating with all the nodes in a cluster and service discovery. I recently had an interview where the discussion was about how nodes in a leaderless cluster could be constantly monitored so that when any of them goes down, their jobs are assigned to other nodes in the cluster. I said we could use Zookeeper to do heartbeat with the nodes but the interviewer wanted to know more about how that can be done and what that process looks like. I had no answers for that. So to save my face if such thing happens again, can some one please suggest good resources (which have examples on failure detection) on such coordination systems ? Thanks TC - 310 YOE - 5 #systemdesign #interview #google #facebook #amazon #microsoft #uber #oracle #netflix #square #roblox #robinhood #snap #pinterest

Alteryx PGY75 Jan 19, 2022

Did you read DDIA? He talked about zookeeper specifically. It is not a deep dive but it brings up some of the snafus (including why heartbeats can be troublesome). It's not a deep dive into coordination but it's a start, and there is probably a good reference included as well

Cisco RfRR42 OP Jan 19, 2022

I read DDIA but can you share which chapter in DDIA specifically talks about failure detection using Zookeeper ? Thanks.

Alteryx PGY75 Jan 19, 2022

Mentions it in chapter 6 and then talks a lot about it in chapter 9 (consistency and consensus). Mentioned 35 times according to chrome. He also mentions that zookeeper is based on Google's Chubby lock service (who named that?!) And has a reference. Would definitely check those out.

Adobe irfar Jan 19, 2022

You can get started with ZK documentation for ephemeral nodes

Cisco RfRR42 OP Jan 19, 2022

Thank you. Do you mean this one ? https://zookeeper.apache.org/doc/r3.4.13/zookeeperOver.html

Ontario Teachers'​ Pension Plan Icycool82 Jan 19, 2022

Following

NVIDIA nvjnsn Jan 19, 2022

Curator https://curator.apache.org/ . Just makes life simpler and has a lot of recipes to abstract zookeeper coordination and other tasks .You can look at the recipes of leader election and read more about their details and code