I am preparing for system design interview. I did some back of the envelope calculation that netflix servers respond with 250GB of data per second. Assuming 128 MB/s upload speed per machine, this means we need 2000 machines. But the communication happens through load balancers. Then how are the load balancers, which will be only a fraction of 2000 (may be 10), able to push out so much bytes per second? What am I missing?
The LB just tells the servers where data should go, the data shouldn’t need to go through the LB
Interesting. Can you please elaborate or share some links related to this? My understanding is the response follows the same path as the request hence it has to go through LB.
It depends, the LB may be able to redirect you to a backend server and get out of the way.
I think i get your point. You're saying that despite there are some machines that sends data at maximum capacity, there are some machines that'll get bottleneck. Especially at Load balancer level. The answer for that is DNS. It basically switches target LB based on the load capacity. So DNS server doesn't get that much traffic as it just tells other services which IP to connect to.
Lookup how AnyCast/BGP based load distribution works. There’s also AnyCast DNS. Basically, you decide at your edge node (CDN) how to most optimally load balance requests.
Netflix puts the content in ISPs. Read about Netflix Openconnect https://openconnect.netflix.com/en/
👆
The CDNs are hosted at ISP's premises or internet exchange point. The limitation is really just hardware interconnect speed limit. And we can always have more CDN instances for big ISPs, it's static content after all and content management can be done in off hour.
Let me see if I understand this correctly. Since the cdn instances are hosted by ISP, they don’t have egress limits i.e., network hardware is the bottleneck. If the instance has 10 GB/s nw port, theoretically it can serve 10 GB/s. Is this right?
Yes. Usually server is not bound by network speed, but storage IO speed though. That's why Netflix would design its own CDN hardware.
Read about Open Connect
Read this: https://netflixtechblog.com/serving-100-gbps-from-an-open-connect-appliance-cdb51dda3b99
That’s gold right there. You better ace the interview now. How did you arrive at the 250GB/s number?
Yeah great responses! Actually as per my calculation it came to 8TB/sec. Earlier I miscalculated the last step by dividing by a day instead of hour and that resulted in 250 GB/sec. Thought process: - 100 M users total, DAU: 20 M - 50% watched during peak hours: 7pm - 10pm for an avg of 1hr each - 10M hours streamed in 3 hrs ==> 3 M hours / hr Streaming rate: - 8 GB on avg per hr of video - 8GB * 3 M ==> 24,000 TB / hr ==> 8 TB / sec Read here http://highscalability.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html that netflix streams around 1B hrs per week. So 8 TB/s is probably not too off.
Nobody here directly answered your question. The answer you are looking for is DSR. Return traffic does not need to go through LB. Read up on DSR - Direct Server Return. Matter of fact, most LBs are setup with DSR
To add to it, yes DNS is a load balancer like others have said but for large scale applications, DNS alone is not generally sufficient. DNS is typically used to load balance globally - for eg directing user traffic to closest Geo - but most at scale applications use a local LB (either a software LB - eg LVS - or a hardware LB) as well for various reasons - scaling seamlessly, public IP limitations, reaction times for failed nodes, security (for eg SYN flood attacks) etc etc So, your question is still very valid - not withstanding the global load balancing from DNS - and the answer lies in setting up DSR. There are multiple ways DSR can be set up. L2, L3 etc You can find ton of relevant material if you search online for DSR.
Tech Industry
Yesterday
750
Database companies that pay well for Staff SWE
Tech Industry
Yesterday
516
Is being in the Bay Area worth the taxes?
India
5h
601
What do vegetarian Indians eat for protein?
Tech Industry
Yesterday
518
Bitcoin is the only possible future
Ask Blinders
Yesterday
936
Why is our country owned by Israel? I don't want my tax dollars fund genocide. How can we stop this nonsense?
That huge amount of bandwidth must be for video streaming egress, which is mainly done from CDN. Probably Netflix has a pull CDN that’s geographically load balanced with DNS. So no, not all traffic has to pass through 10 LB
Good point. Then the problem shifts to CDN. How are CDNs able to push out 10s of GB per sec? They should also have a notion of LB and multiple machines behind it right?
Not an expert, but DNS is a load balancer! You can have a geographically load balanced DNS (AWS, Akimai, etc support this) such that dns lookup request from an IP address is given the closest CDN host to it. You can have a big pool of CDN hosts to load balance at the edge. Each CDN host can be a LB itself, which has super high network bandwidth and defers the lower bandwidth nonvolatile reads (Disk, SSD) to another pool of downstream hosts