Tech IndustryAug 28, 2022

System Design Interview - Consensus

So, I'm trying to get *really* good at system design interviews, and I've recently encountered an issue. There's plenty of content and whitepapers about how consensus algorithms (like raft and paxos) work, but not a ton on how to use them (particularly as a component within a system design interview...). - Grokking's ticketmaster solution either settled for 2PC or kept the problem scale small enough to keep in a single unsharded postgreSQL instance - Alex Xu's 2nd book kinda had raft in one chapter in particular (Digital Wallet), but it was totally unclear what exact data was stored in the nodes or what's really going on in the nodes. There's no db schema in there or anything like that, and it seems like the thing that's running raft is being used as the data store itself (wtf). - DDIA mentioned that zookeeper can be used as an "outsourced" consensus solution (p.375), but of course no diagrams of how that would look within a bigger, non-abstract system diagram What I'm looking for: - I basically just want a diagram where zookeeper is being used as for "outsourced" consensus, similarly to how it was described in DDIA - What is the request and response pattern for this? What thing calls what other thing in what order? - How do I do host count estimates for zookeeper? (e.g. Let's say this application service is handling 100,000 TPS) TC: 370k

Hide company name

0 credits left

Sort by :

Twitter staffmle Aug 28, 2022

While this is good to understand and I have studied this level of detail for the paxos/raft algorithms in the past, know that this won’t explicitly be asked in system design interviews for senior/staff level. Unless the team/role is specifically in that domain. Let me retrieve some of those links later for you.

Apple re:DESU OP Aug 28, 2022

I didn't ask this question for hand-waving the problem away as "too deep"; links would be appreciated. I'm currently at the senior level and have a goal of eventually hitting the staff/principal level. This dude detailed the paxos workflow and then did latency estimates during an interview: https://us.teamblind.com/s/4yufM3RY I know he probably went a little above and beyond, but I kinda want to get there as well, so that's what this post is about.

New

bqHU16 Sep 1, 2022

There is a google talk (can’t find link) on how to calc paxos latency times I believe. All he did was basically steal the idea from the talk and pretend he a ton about it. Basically don’t index too much on that.. it pretty came down to.. paxos needs 3 RT connections so to calculate latency just do 3*IntraDC network call…

Pinterest pindiesel Aug 28, 2022

what’re sources are you using to study?

Apple re:DESU OP Aug 28, 2022

Currently: - DDIA (90% done) - both alex xu's books (50%+ skimmed to some extent) - donne martin's system design primer (PDF printed at kinkos) - both the grokking system design courses (pirated & printed at kinkos) - Database Internals by Alex Petrov (super deep, 10% covered, probably don't even need it for google staff interviews) - the google SRE book and workbook Next: - whitepapers - InfoQ, @scale, etc. on youtube - more engineering blogs (tried setting up an RSS reader on my phone, but my initial experience is that half the posts are irrelevant or just not interesting) I do a weekly "stream"/"discussion" of a different system design problem every saturday evening (I somewhat view this as free mock interviews in front of a full panel of engineers) on my discord channel that I've been running for at least a few months: https://discord.gg/WM6BBX7kjx (This particular blind post is created as follow-up from trying to force myself to finally pick up paxos/raft stuff last night...) All my resources above and perhaps a bit more are listed in the #read-me channel. We're probably not as great as Alex Xu or the DDIA guy, but it is a group that is eager to learn about system design and you are more than welcome to join us :)

Nike fattonyit Aug 28, 2022

Hey @re:DESU any relevant chapters to focus on in the google sre books?

DataRobot qVsE13 Aug 28, 2022

Study the code of any system that’s dependent upon zookeeper/etcd it’s relatively straightforward: 1. HBase -> ZK 2. Kafka -> ZK 3. k8s -> etcd That should serve you well enough. If you’re interested in the internals, read the source for ZK or etcd. If you’re interested in estimating latency to perform a round of consensus, then find a sequence diagram of paxos/raft, note the # of RTs, and multiply by the estimated latency between your data centers.

Apple re:DESU OP Aug 28, 2022

Thanks, I'll try that out. :) I greatly appreciate your input Quick question: what's RTs? is that the number of replicas/nodes running zookeeper or whatever the coordinator service is?

DataRobot qVsE13 Aug 28, 2022

No problem. RTs = Network round trips. The same concept that allows you to show the latency difference between a connection less UDP protocol vs. TCP’s 3-way handshake.

Cisco udoQ26 Aug 28, 2022

So without knowing all these it seems it's possible to make it to Apple ?

Apple re:DESU OP Aug 28, 2022

If by "all these", you mean the consensus algorithms, then yeah. I actually briefly touched on 2PC in my interview and scored my current role. That's also "good enough" for Amazon Sr SDEs. Not sure about google L5 though.

Cisco udoQ26 Aug 28, 2022

Kafka uses Zookeeper for consensus .Solr cluster also uses Zookeeper for consensus. Try to follow tutorials and set up locally on laptop and perform failover AND SEE master step down . Also etcd in Kubernetes also helps to accomplish this . Check for Kubernetes leader election .

Apple re:DESU OP Aug 28, 2022

Well, I'm really just interested in seeing how tf to use zookeeper for distributed transactions, so I think I might as well just set up a pair of toy postgreSQL instances + zookeeper, and try to implement what I have in the diagram in my OP post at that point

Wish gnGsye5 Aug 28, 2022

What’s your YoE?

Workday frizcl Sep 9, 2022

I've been using consensus for about a decade. PM me if you'd like to talk through it.

Cohesity Duncan39 Jan 20

Hey Can i dm you if you are available??

Industries

Job Groups

General Topics

Sponsored

Most Read

System Design Interview - Consensus

Most Read