Dropbox System Design

New
vin_sanji

New

vin_sanji
Aug 31, 2021 4 Comments

I was reading grokking for Dropbox and there they mention that the file is broken into 4mb chunks and sha256 of that is calculated to check if that chunk is already present on server for deduplication.

I was wondering if two chunks can have same sha256 then how is this reliable ?

comments

Want to comment? LOG IN or SIGN UP
TOP 4 Comments
  • GitHub
    WtcFinal

    Go to company page GitHub

    WtcFinal
    Theoretically you can find more than 1 pre image of a single sha-256 hash but practically its almost impossible. That’s why we frequently use Sha-256 digest as checksums also to verify data integrity
    Aug 31, 2021 3
    • Amazon
      nodlehs

      Go to company page Amazon

      nodlehs
      Even if you have a collision, you can get around with some chunk-id, which can be uniquely chosen for each chunk in a file...

      A better approach maybe is to make immutable chunk and compare chunk-ids...
      Aug 31, 2021
    • New
      vin_sanji

      New

      vin_sanji
      OP
      This checking is for chunks uploaded by two different person.

      It says that when a user uploads a file it calculates it sha256 and check if it's present or not.

      If it's present they you don't need to store the file (assuming same hash means same chunk which is not true) actually since it's already present.

      So you need to have two different chunk I'd in Metadata but they both point to the same chunk in object store.
      Aug 31, 2021