What’s the expectation for this question? Seems to be a common question at Meta https://leetcode.com/discuss/interview-question/124657/Facebook-or-System-Design-or-A-web-crawler-that-will-crawl-Wikipedia Do we need to discuss a distributed peer to peer crawl with chord algorithm for minimizing communication? (Seems a bit advanced compared to general system design questions. Any notes or pointers to this problem would help) @Meta
It depends on interviewer some just did distributed crawler and got offer some just disnmiddosseveices block and got offer some did peer to peer like BitTorrent and did not get offer. It's a luck now
E4 - web crawler chapter from Alex xu E5 - video on web crawler from system design fight club
Recently I saw a reject it was for e4. The expectation was peer to peer. Read the question before you answer https://jenkov.com/tutorials/p2p/chord.html https://www.malwaretech.com/2013/12/peer-to-peer-botnets-for-beginners.html But yes it's all luck most might pass so it's better to ask what what want
Why would the expectation be peer-to-peer, and what makes you think that’s what was wrong with your answer? Meta particularly stays pretty hands-off on requirements and prefers candidates to “lead” by “proposing requirements”. They might do some course correction on your design by saying things like “what happens if this node experiences an outage?” or “how is that DB/service going to handle that much traffic?” The interviewer shouldn’t have one specific solution in mind. It’s common for candidates to mistake pushback on their mistakes for an interviewer that has a specific solution in mind. Peer-to-peer gossiping is just a replication strategy, but single-leader can scale arbitrary high through sharding. Which of the data stores or components would’ve even been using peer-to-peer replication?
“Do we need to discuss a distributed peer to peer crawl with chord algorithm for minimizing communication?” Don’t do that. Design it just as you would for a project at your current job, if they decided to randomly give you a massive project and total freedom for an entire year. Understanding the rationale for message brokers isn’t easy if you’ve never recognized the need for one before, FYI
Seen this problem in an interview last week. Peer-to-peer was explicitly required so standard crawler design approaches did not apply.
Depends on how comprehensively you can address all points in given time.