I got this system design question at my Meta onsite (E5) where I was asked to design a price tracking application like camelcamelcamel.com. I answered most of the questions that the interviewer had, but I am not sure if I choose the right db. The requirement was to store price changes for up to 2 years for a product. So I choose a wide-column database like HBase. I didn't commit to HBase but gave it as an example. My design was to store data as follows. PS : I was able to convince my interviewer about everything else, like caching, fault tolerance and the microservices I would use etc. But I am under the impression that it was all built on top of a shaky foundation of the db. product_id. created_date. day1. day2. day3 ...... day 730 Can anyone tell me if this is a bad design? The continuation of the story : Check out this post! "Meta / Facebook Offer (Software Engineering Career)" https://us.teamblind.com/s/22heAtCO UPDATE : Selected at E5 Level Current TC : 200k CAD (130 + 16k + 54k) Base + bonus + RSU/year YOE : 7.5 #tech #meta #systemdesign #interview #faang #facebook
Why not use a database more suited for time series data?
That was my doubt too. But I had no experience or working knowledge of time series db's so I went with Hbase.
http://opentsdb.net/ is built on HBase.
I failed my E4/E5 due product design and system design follow-up. Got another question but clearly stated that would go with MySQL sharded cluster because don’t have working knowledge of Cassandra HBase. Looks like the interviewer was disappointed with my answer
Did you go good on the coding rounds?
Even though I invested 1K in mocks with E6 from Discord famous here
I nailed codings. My recruiter clearly told me the rejection is due to SD round and “fierce” competition
Ouch man. I'll find out soon enough. If they don't get really good sigals from 1 SD interview will they schedule another round?
In my case I was asked for a follow up almost immediately after the first round (and last in on-site series)
Why not a relational DB?
Because Relational db's don't do well with data appending, performance wise. Why do you think chat applications don't use rdbms?
How often do the prices update though? Maybe they wanted more follow up.
Where the price data come from? Also the time series DB does not apply here since we are talking about one data point per day, not per sec or per min, etc..
How does the metric granularity make a difference? It is time series data if it is reported every second or every day. It’s about the number of time series we need to store, which would be a lot.
I just think one data point per day, it’s not going to be a lot of data like the typical time series metrics data. I do not think the benefit of using TSDB
When reading the price dat for a product, how many days do you need to read? 30 days? 7 days?
2 years
definitely very weird but u still got a chance
Not relational because unnecessary there is no relations to be maintained. Cassandra is fine for storing historical data. Use product id as partition key and date as sort key
You can have both, Relational and a NoSql Db. Relational to store the current snapshot (price) of a product, and a NoSql Db to store the history of it. Lets, assume you are 'watching /scraping' 1mil prodcuts. You can have a normal relational db, (MySql, Postrgress, Aws RDS, doesn't matter what) to store each product's current state. So, you'd have 1mil rows in the product table(s). Easy to query against, price compare, etc. Then you can choose a nosql db to store the price history for each product. So, you'd have 1mill entries, each pointing to a object/json with entries for each day they have been observed. (eg. 365x2 lines of json). This can be used when a user wants a detailed history of a product, or when you run more advanced analysis against the products. Best of both worlds. Someone here, mentioned time series dbs, which are more specialized type of DBs. Those work too. But the beauty of a RDBS + NoSql combo is that they are pretty standard and supported by every vendor out there.
I liked your idea, actually to store the history you have a variety of choices here - for example - you can also store in elastic search cluster and enjoy a wide variety of querying options if needed. I assume we can pay the price of indexing
Only bad if you want more days
The interviewer said it will be a max of 2 years
What would you have choosen?