Does anyone has any links/docs/vids/books reference which i can follow to understand about "scaling" system or to build architecture that can handle very large scale of data. Lets say, what would be my stack if i want 3M hits per seconds Can store petabytes of data Can search and return results in millis Is highly secured Please don't suggest using aws, gcp, azure etc. What if i want to reach here with only open source tech stacks.
Are you really working at Amazon? Seriously you expect an answer with just this details? What about Data Model, Consistency Gaurantees, Write/Read patterns? And many more such relevant details? We can suggest you not to use any of the publically available cloud but that would mean you now would have to solve problems like Compute/Resource management, Autoscaling, Zones etc.
Time for everyone to follow promotion oriented architecture and write their own db based on the same fucking paper from 20 years ago.
For that type of performance, you’ll need a cache; I like Redis for caching. Nginx as a proxy server in front, or Apache if that’s your thing. For search, something like ElasticSearch, Lucene, or Solr. For data storage, there’s sharded PostgreSQL or MySQL if you’re in the traditional RDBS camp, CockroachDB for a more spanner-esque approach, and then the plethora of NoSQL stores like Cassandra and others. You can use these with a system like Kafka in between to push data between systems. You’ll also need some type of server to serve up the response. Go might be a good choice if you’re in need of performance. But the reality is that every system scales differently and has different pain points, depending on how important it is that you have ACID compliance to how you manage connections (i.e. WebSockets or long polling or a bunch of tiny requests) to the size of the data you’re serving (video? Log streaming? ML training sets?).