For relational databases, Is there one master node for all the database writes or one master node for each partitions. Kafka seems to have one leader for each partition/topic however I couldn’t find any resources for traditional relational databases… Database or system design expert Please help! I’ve already read DDIA and other system design prep materials and they don’t mention this there. Google is giving me blurred answers..
In a traditional setup you only have one primary node for writes and multiple secondary replicas for read. Once that write node starts becoming a bottleneck people usually partition the data and the you can have the same setup for multiple partitions. E.g. Partition 1 handles PK from 1-100 and has 1 primary and 2 secondary. Partition 2 handles PK from 101-200 and has 1 primary and 3 secondary (has more read load) and so on..
Thanks for the reply! So i guess its configurable. Is this a difficult configuration to setup / manage or pretty easy thing on a relational db?
Once you have a system that's being used at scale, it's always harder to change. If you have a team that manages db then it's a bit easier but still a lot of effort. NoSQL databases usually have thought about scale problems so you typically won't have to do this there. There's pros and cons there too.
You read DDIA but still asking basic questions about sharding and replication 🤔
Im asking specifically for relational databases. Chapter 5 and 6 does not explain fully how partitioning works in conjunction with master slave architecture for relational dbs…
The generic answer for "relational databases" as a whole is... it depends. MSSQL AlwaysOn/HADR (azure SQL is the same) has a single primary node handling all writes. If you have partitioned data it's possible to have a separate master for each partition , whether or not this is the case will vary by implementation.
Depends on the system. Spanner iirc is a paxos group per partition with 2PC coordinating the partitions. MemSQL uses master-slave replication per partition with 2PC coordinating the partitions. Aurora has a single master node who’s log entries are replayed to read replicas (actually they have multi master but I’m not sure of how it works)
Thanks for the help! 😊 I guess it all depends on different relational dbs. I rarely work with relational db’s and only have book knowledge about the topic. This helped!
Please don't use master-slave words. You can call them primary secondary nodes
By relational db, I guess you are referring to traditional SQL databases like Oracle, SQL server, etc They only run on one server(one node) by default, so all the read write go through the same instance. And then there are read/write locks at table, row and block(?) They scale vertically. Even when they scale horizontally, like 2 Oracle instances for one cluster running on 2 different servers, they read/write to the same storage. And then every database has a way, like Oracle Golden Gate to replicate storage. But this is more for DR not for actively querying. And DB2 had an active-active mode which was a better version of above. Tldr - There is only one node which handles entire data.
Yes i was referring to traditional sql dbs. I was looking for an atomic database solution that can handle large concurrent writes with big data storage. Do you have any recommendations?
It depends on how much data and how many transactions. I think a few thousand tps and a PB of data any DB can handle. If it is more you can partition at DB level or have multiple DBs and partition at the application layer.
Checkout Zalando postgres operator , Vitess MySQL operator and AWS RDS you will get to see different flavors of how HA is accomplished in Relational databases .
Read DDIA
I did… it doesn’t state. What’s your thought?
ServiceNow doesn't know. They just wanted to post a snarky comment and gain useless blind points