Scaling a read-heavy system is straightforward as we can add more read replicas. Write to the main node and read from replicas. How is a write heavy system scaled? Do we use some ingestion pipeline similar to ELK? I don’t see this explained in Grokking the system design interview, Designing Data Intensive Applications or Web Scalability for Startup Engineers. #systemdesign #google #amazon #netflix #facebook #uber TC: $405k (185/185/35) YOE: 6 Serious answers only please.
F
?
Depends on the details but in general all write heavy systems use some sort of a queuing mechanism to avoid data loss and increase write reliability. Delayed commits are being used everywhere from the network edge to the disk.
Is the queue in another database? Wouldn’t that be the same issue?
That's a good observation durability of queuing may constrained by disk and network I/O but conceptually queuing implementations are far more optimized for that particular use case rather comparing to storage solutions. They have different access patterns. Then again queuing also has a wide spectrum from brokered to fully distributed. The key difference is storage used for queuing is mainly local.
So many knobs to tweak that it's almost impossible to have a standard answer. It's also very dependent on the underlying store. What'll work with Spanner won't work with other relational databases for e.g.
Pre enrichment means what?
Is it for real life architecture or sys design interviews? For interviews, my answer is using a standard NoSql DB like dynamodb or cassendra. As someone answered before, you scale by adding more shards which is pretty painless with Dynamodb. Be prepared though if interviewer wants to do a deep dive. My way to explain that is using consistent hashing along with some sort of cluster replicating writes on many replicas. For real life, DDB is still a great answer. But there will be several other considerations.
Can you explain cluster replicating writes on many replicas means? Replicas means dynamodb shards?
Well, that's kind of getting into the guts of DDB. I don't know for sure but each DDB shard could be a cluster. Cluster consists of 'n' nodes. Each node is a replica and has a copy of the data. One way to do this is single master replication, i.e. one of the 'n' hosts is elected a leader. It takes all writes. Then it replicates those to the remaining nodes or replicas
Depends on how you are writing. Is it real time writes, transactional writes or batch uploads. Real time would need a kafka topics which come with its own challenges of partitioning etc. if it’s transactional systems then it depends on high consistency, acid etc. if it’s batch uploads at petabyte scale then you need to use some nosql system with good partitioning scheme and think about how you can incrementally manage the load. This is all based on my experience.
Database sharding, partitioning and message queues
For strong ACID properties and relational database you can scale vertically by adding more processors. For horizontal scaling you can add multi threading with appropriate level of incompatibility. For eventually consistent systems just use non relational database with partitioning and disallow edits and use appends at application layer. Basically writes are easier to manage than edits.
General question around eventually consistent systems. What tools/practices/designs can be used to sync up the separate partitions? If the system has to be eventually consistent, how do I make sure this consistency happens? My dumb solution would be to have an overnight job sync up the data, but this is clearly not a great solution in many cases.
If you want eventually consistent then asynchronous replication is the easiest way to go on every write.
I’m too dumb to add to this conversation but I like these kind of questions/answers, I learn a lot from them, so thank you!
We’re all at different stages of the learning process in this field. Glad this post is helping you. “Do not speak badly about yourself. For the warrior within hears your words and is lessened by them.”
Nice quote 😄
Very use case dependent. Shard the database, use distributed databases (DynamoDB, Bigtable, Cassandra), LSM databases are usually better suited for write heavy applications. Each choice will have it’s cons, choose based in use case.