I was doing some light afternoon reading: https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c In in the last section "Solution", where they are generating a globally unique ID based on the DB's autocrement feature + milliseconds since epoch + shard ID. Why do we need to append shard ID to it? Specifically, the example says: "Next, we take the shard ID for this particular piece of data we’re trying to insert. Let’s say we’re sharding by user ID, and there are 2000 logical shards; if our user ID is 31341, then the shard ID is 31341 % 2000 = 1341. We fill the next 13 bits with this value" This doesn't make sense: if you are already modding user ID by number of shards (31341 % 2000), that means 1) You already have user ID, then what do you need to generate another ID for! 2) You already know the shard it belongs to with the mod function, why appending it again! What am I misunderstanding here?
Are you saying that they are sharding by user ID but storing that users pictures on the same shard as that Id? Wouldn't that make picture searching inefficient?
Sigh. With sufficient scale you’ll see collisions at millisecond levels even nanoseconds. Look up how twitters snowflake service does it.
what happens when user 1 puts image A and B at the same millisecond and useR 2such that user 2 shard ends with same mod puts image C and D at the same millisecond
I guess it generates a unique incremental id for the data related to the user like images posted etc.
Yep it allows for identifying other data on that shard at the same timestamp.