I was doing some light afternoon reading: https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c In in the last section "Solution", where they are generating a globally unique ID based on the DB's autocrement feature + milliseconds since epoch + shard ID. Why do we need to append shard ID to it? Specifically, the example says: "Next, we take the shard ID for this particular piece of data we’re trying to insert. Let’s say we’re sharding by user ID, and there are 2000 logical shards; if our user ID is 31341, then the shard ID is 31341 % 2000 = 1341. We fill the next 13 bits with this value" This doesn't make sense: if you are already modding user ID by number of shards (31341 % 2000), that means 1) You already have user ID, then what do you need to generate another ID for! 2) You already know the shard it belongs to with the mod function, why appending it again! What am I misunderstanding here?
Are you saying that they are sharding by user ID but storing that users pictures on the same shard as that Id? Wouldn't that make picture searching inefficient?
Sigh. With sufficient scale you’ll see collisions at millisecond levels even nanoseconds. Look up how twitters snowflake service does it.
what happens when user 1 puts image A and B at the same millisecond and useR 2such that user 2 shard ends with same mod puts image C and D at the same millisecond
Cars
Yesterday
1722
Why are Americans obsessed with SUV?
Health & Wellness
9h
3743
Why are women naked in gym?
2024 Presidential Election
7h
448
Who are you voting for in the 2024 Presidential Election?
2024 Tax
8h
1287
Biden’s new tax proposal is wild
Layoffs
Yesterday
35305
Google CFO confirms "large-scale" layoffs today (Apr 17)
I guess it generates a unique incremental id for the data related to the user like images posted etc.
Yep it allows for identifying other data on that shard at the same timestamp.