Database - Starting of with 1 billion - scaling to 1 trillion

New
netnews

New

netnews
May 11, 2020 31 Comments

Hello all,
I am currently building a system that will handle 1 billion records at first with the intent to scale to 1 trillion records!

I had custom coded my software to use Redis as a cache for the data and a LMDB database to store the data permanently. It works fine for standard purposes, but I want to have search capabilities built in into my new application. Obviously with a key-value store like Redis or LMDB, you would have to search all the keys to get matches.

I was wondering if a tool like mariaDB, mySQL, MongoDB or PostgresSQL would better fit my needs. I would also like a database solution with search capabilities whether its built in or through and extension. Write time doesn't matter to me, and search time can be a tad slow but obviously not something out of reach (i mean there are billons to trillions of records soo).

#data #engineering #software #database #swe

comments

Want to comment? LOG IN or SIGN UP
TOP 31 Comments
  • Rackspace
    2sum

    Go to company page Rackspace

    PRE
    Rackspace
    2sum
    Who asked this system design question, btw 1 trillion per hour/day?
    Do a arch diagram and ask question which one will break? Where my user bases are?
    May 11, 2020 4
    • New
      netnews

      New

      netnews
      OP
      I dont want a database solution in which one file system error can permanently destroy the database...there should be room for recovery in case of a hdd failure. Obviously with data this big i will be shardinv but anything can happen you know....
      May 12, 2020
    • Cisco
      takecountr

      Go to company page Cisco

      takecountr
      This is literally what HDFS is for... DISTRIBUTED
      May 13, 2020
  • Have you heard of sharding ?
    May 11, 2020 2
  • Amazon
    gdyhfh

    Go to company page Amazon

    gdyhfh
    Requirements are unclear. What are consistency and availability requirements ? Which one would you trade off during failures? For search capabilities you could utilize a secondary store like elasticsearch and index data for search. Using it as a primary store has its own problems.
    May 11, 2020 0
  • Cisco
    boba4life

    Go to company page Cisco

    boba4life
    I am a recent grad and I have no idea what you guys are talking about (inverted index, Mongo w fsync)

    I did not take db course before. Where should I start?
    May 11, 2020 3
    • New
      qKNn01

      New

      qKNn01
      Read Designing Data intensivr systems
      May 12, 2020
    • New
      OqjD47

      New

      OqjD47
      U don't need to actually know what it means just say that in interview and interviewer will nod head
      May 17, 2020
  • Walmart
    elango

    Go to company page Walmart

    PRE
    Google
    elango
    As long it's not transactional data, always scale out. Solutions like , Mongo, Cassandra will suit your needs.

    On YouTube there is video of how Uber built their back end infrastructure, that's a good reference for your usecase.

    Relational databases stop working after a certain rate of transactions , that's usually above 1 million transactions per second.

    From explanation of your post it looks like , your usecase needs a good mix of RDMBS and nosql/document store. But you need to write a better detailed explanation of other requirements of the usecase.
    May 13, 2020 2