Industries

Tech

Financial Services

Healthcare

Job Groups

Software Engineering

Product Management

Finance

General Topics

Ask Blinders

CryptoNEW

CarsNEW

2024 Presidential ElectionNEW

Tech IndustryOct 23, 2023

Metarushom

How do you shard/cluster/partition postgres/mysql

Let's say we have a users table that has a few hundred million users. how do you store this table in multiple servers? does the dbs support it or is it mostly application side logic? I know it is easier in mongodb but jjust wondering how this is done in relational dbs.

Sort by :

Google CutgBKuqnF Oct 23, 2023

Shard it? It falls on the application side to decide which DB cluster to hit. Usually that is also abstracted away from the application layer into its own service.

Meta rushom OP Oct 23, 2023

thanks. so does that mean that joins need to be handled on the application side as well or do you need to shard all the relevant data based on the same shard key?

Meta GuDR25 Oct 23, 2023

Usually same shard key across all the tables where that user id is stored. Sometimes ppl employ different sharding keys for different table for the same user (very inefficient)

New

JfPK61 Oct 23, 2023

if you vertically scale it then it might all work on one server. mysql I believe has some auto sharding options. for postgres you would probably need application sharding logic based off hash or ranges.

Meta rushom OP Oct 23, 2023

at some point vertical scaling would hit its limit right? I wonder what are those limits. Can you store billions of rows with terabytes/petabytes of data in one db?

New

JfPK61 Oct 23, 2023

right but a hundred million users is probably ok. otherwise you need to shard

Amazon ghatt Oct 23, 2023

How does it work in mongo?

Meta rushom OP Oct 23, 2023

i am not an expert but it is a builtin configuration. probably a hashring sharded by the _id value.

Confluent dsvy Oct 23, 2023

Why u want rdbms . They won’t scale until you take some commercial new entries . Use data lake.

ByteDance guapguap Oct 23, 2023

Check out storage disaggregated DBs like AuroraDB if you want to stick with RDBMS. They should perform well for your use case.

Google fjru Oct 23, 2023

The first thing is to determine whether your queries are stable. If you are working in a very new domain, chances are they are not. If your queries are not stable then you first shard the database and then shoot yourself. The second step is to differentiate between queries that are transactions and queries that are analytics (aggregation over many rows and tables). You can forget all the hard work you have done for analytics at this step, you will need to rebuild the whole system for analytics queries after sharding. Now, for all the queries that serve the online users, consider all the rows\columns are never going to be part of the same transactions. If there are two different cells in the same row that are not going to be part of the same transactions, split the table into two. If there are two rows that are not going to be part of the same transactions, define the shard key based on the row differentiator. You will end up with islands of data which are strongly connected internally but not connected with other islands. You define an assignment and routing policy for these islands and put them on different RDMS instances. If there is some lose threads between two different clusters of data, you may need to use eventually-consistency scissors to make them islands. Define a thin routing aware client library so that the complexity of routing does not go beyond a few lines of code. Depending on the policy you are going to use, you may need a routing service as well, although for many cases consistent/randevous hashing can serve as a static configuration. Note, these steps are not free in the MongoDB world as well.

Meta rushom OP Oct 23, 2023

thanks

ByteDance guapguap Oct 23, 2023

Tl:dr is don't shard until you absolutely have to. Yes horizontal scaling has a higher limit than vertical scaling but most transactional workloads aren't truly limitless. You're not going to go from 1m users to 1b users in a week.

Meta rushom OP Oct 23, 2023

I wonder how systems that require high consistency such as systems that handle financial transactions work. Most of them use relational dbs.

Google fjru Oct 23, 2023

One of the worst lies that RDMS books tell people is the total amount of money across all accounts needs to be invariant. No, it's not. Bank transfers are inherently eventually consistent. Take any two bank accounts and send your money from one to another and check for yourself. The people misunderstand eventual consistency by thinking there are times where the system goes through incorrect states. This is wrong. If I send my money from account A to B, it is eventually consistent in the following order, 1. Atomically A -= amount and TransactionIntent(A, B, amount) gets created 2. TransactionIntent eventually applies where B gets credited with the amount 3. The intent executor has reply protection, where it ensures a TransactionIntent does not get applied twice. Eventual consistency is meant for people. The system must be always strongly consistent no matter what your technology of choice. You can do this in MongoDB too.

Hopper bunny hop Oct 23, 2023

The important part here if anyone missed it, is the A in ACID giving you both 1) deduct from source account and 2) create TransactionIntent, atomically. If either fails then then transaction fails entirely. Banks are distributed systems by nature. No matter what DB you choose you won’t atomically deduct from Chase and add those funds to Wells Fargo.

Sort by : ...

Tech IndustryOct 23, 2023

Metarushom

How do you shard/cluster/partition postgres/mysql

Sort by :

Google CutgBKuqnF Oct 23, 2023

Shard it? It falls on the application side to decide which DB cluster to hit. Usually that is also abstracted away from the application layer into its own service.

Meta rushom OP Oct 23, 2023

thanks. so does that mean that joins need to be handled on the application side as well or do you need to shard all the relevant data based on the same shard key?

Meta GuDR25 Oct 23, 2023

Usually same shard key across all the tables where that user id is stored. Sometimes ppl employ different sharding keys for different table for the same user (very inefficient)

New

JfPK61 Oct 23, 2023

Meta rushom OP Oct 23, 2023

at some point vertical scaling would hit its limit right? I wonder what are those limits. Can you store billions of rows with terabytes/petabytes of data in one db?

New

JfPK61 Oct 23, 2023

right but a hundred million users is probably ok. otherwise you need to shard

Amazon ghatt Oct 23, 2023

How does it work in mongo?

Meta rushom OP Oct 23, 2023

i am not an expert but it is a builtin configuration. probably a hashring sharded by the _id value.

Confluent dsvy Oct 23, 2023

Why u want rdbms . They won’t scale until you take some commercial new entries . Use data lake.

ByteDance guapguap Oct 23, 2023

Check out storage disaggregated DBs like AuroraDB if you want to stick with RDBMS. They should perform well for your use case.

Google fjru Oct 23, 2023

Meta rushom OP Oct 23, 2023

thanks

ByteDance guapguap Oct 23, 2023

Meta rushom OP Oct 23, 2023

I wonder how systems that require high consistency such as systems that handle financial transactions work. Most of them use relational dbs.

Google fjru Oct 23, 2023

Hopper bunny hop Oct 23, 2023

Industries

Job Groups

General Topics

How do you shard/cluster/partition postgres/mysql

Sponsored

Most Read

How do you shard/cluster/partition postgres/mysql