Tech IndustryApr 27, 2019
Chaseghosted!

Question about generating globally unique IDs across shards

I was doing some light afternoon reading: https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c In in the last section "Solution", where they are generating a globally unique ID based on the DB's autocrement feature + milliseconds since epoch + shard ID. Why do we need to append shard ID to it? Specifically, the example says: "Next, we take the shard ID for this particular piece of data we’re trying to insert. Let’s say we’re sharding by user ID, and there are 2000 logical shards; if our user ID is 31341, then the shard ID is 31341 % 2000 = 1341. We fill the next 13 bits with this value" This doesn't make sense: if you are already modding user ID by number of shards (31341 % 2000), that means 1) You already have user ID, then what do you need to generate another ID for! 2) You already know the shard it belongs to with the mod function, why appending it again! What am I misunderstanding here?

Sharding & IDs at Instagram
Sharding & IDs at Instagram
Instagram Engineering
Add a comment
New
NAOC81 Apr 27, 2019

I guess it generates a unique incremental id for the data related to the user like images posted etc.

Snapchat Ffbqr6w8 Apr 27, 2019

Yep it allows for identifying other data on that shard at the same timestamp.

Chase ghosted! OP Apr 27, 2019

Are you saying that they are sharding by user ID but storing that users pictures on the same shard as that Id? Wouldn't that make picture searching inefficient?

Apple KGHP41 Apr 27, 2019

Sigh. With sufficient scale you’ll see collisions at millisecond levels even nanoseconds. Look up how twitters snowflake service does it.

Bose justme2k19 Apr 27, 2019

what happens when user 1 puts image A and B at the same millisecond and useR 2such that user 2 shard ends with same mod puts image C and D at the same millisecond