System design

Feb 16 8 Comments

I was reading Grokking for Twitter system design and for sharding it mentioned, one way to shard is by creation time for tweets. The advantage to that approach being we can fetch tweets quickly since they will all be in one server. I am wondering how do they actually shard it by creation time? Is it by each day, since timestamps are different for each tweet. Or is a range of day time considered for sharding and storing in a particular dB server?

Any inputs to understand this better would be helpful. Thank you!
#tech #systemdesign

comments

Want to comment? LOG IN or SIGN UP
TOP 8 Comments
  • PayPal
    gurudev122

    Go to company page PayPal

    gurudev122
    There will be monthly shards created and data created in that time range will reside in that shard. It helps to parallely fetch data from multiple shards and optimise search space. 12 shards per year, data for feb will be in shard for feb. When searching some xyz term in last year, occ will execute parallel search on all db shards in query time range and collate results. Searches will be faster since data size is limited to 1 month.
    Feb 16 3
  • New
    gr4ph

    New

    gr4ph
    From what I can tell the partition key for tweets is the user's id. This is how it's done in the examples they've publicly shown, but it's possible that they actually use something else. Partitions are randomly assigned to a shard.
    See https://youtu.be/gvdXBC-NReQ
    Feb 16 1
  • New / Eng
    Sukv56

    New Eng

    Sukv56
    I don't know much about this topic. But I think this article from digital ocean would help. What you described is called range sharding I believe and it was mentioned in the article. https://www.digitalocean.com/community/tutorials/understanding-database-sharding
    Feb 16 1