Tech IndustryMay 26, 2019
Amazonslang

SQL vs NoSQL

In my little bastion of Amazon, SQL is regarded as slow and unscalable. Every time I suggest it for a solution with what appears to be highly relational data, I'm immediately shot down with, "eww" or "that won't scale". Last I checked, plenty of behemoth corporations were using SQL for all manner of operations. What's the take at other major tech companies? @facebook @apple @netflix @google

Amazon hot 🍞 May 26, 2019

Dim fact is the golden ratio to minimize duplication and keep it simple. Nosql is just a key value pair when you demand incredible speed. SQL normalized is only really useful with massive datasets where you need to store zero duplication, like click stream.

Amazon qwertyup May 26, 2019

👍That is a good concise answer. So it basically depends on what your doing and the volume of the data.

Oracle Flatlinerz May 26, 2019

What isn't? Even when they say key-document it's just another way of saying key-value

VMware iamvirtual May 26, 2019

Pretty simple, really - if you don't know SQL, just put NoSQL on your resume!

LinkedIn ixBS46 May 29, 2019

Thinking of that meme

eBay BobMueller May 29, 2019

😂

This comment was deleted by the original commenter.
Amazon qwertyup May 26, 2019

😮 That is impressive god, with all that architecture and you managed skip over the question which is about the database layer. If nosql is for every case now or if sql relational databases are still an option.

Oracle OCeye May 29, 2019

MongoDB?

Amazon jefe_bezos May 26, 2019

They both have their purposes; anyone that tells you otherwise is ill-informed. There is definitely an “only use Dynamo” sentiment throughout Amazon, and it’s silly.

Google tyrionstar May 27, 2019

That's because you guys push for Dynamo. SQL based engine could be made to scale now. So ask them not to Eww so much.

Amazon vdcb40 May 27, 2019

Anything that gives you cross-shard ACID properties should always scale worse than a db that doesn’t. That’s not exactly SQL vs noSQL, but in many cases it is.

Oath LxSY42 May 29, 2019

Of course, if you need relational queries and ACID like properties, the worst antipattern is to deploy some NoSQL because it is 'scalable' and then ham fistedly build slow global locking on top of it that scales far worse than a mature SQL database would have. But you keep insisting that you can't use SQL because NoSQL supposedly scales better.

Amazon PotatoSale May 27, 2019

What kind of growth do you expect for your system? The issue with SQL/relational systems is that they only scale vertically, aka you need to get bigger and bigger physical servers. At Amazon scale, most successful product will outgrow the limits of the biggest server you can find for a relational DB, and your will have trouble scaling your application. Amazon wide this pattern has repeatedly over and over again many many times. This is why Amazon tries to steer new projects away from relational DBs.

Google blahhalb May 27, 2019

Doesn't AWS aurora claim to be horizontal scalable?

SAP O6Hma5wlq9 May 28, 2019

MySql and Postgresql have mature sharding options.

Amazon nEuronGJo May 27, 2019

When you do not know your access patterns in advance and you are working on internal tools which do not have a scalability concern there is no reason why a simple rdbms solution will not work. The Dynamo only credo was put in place as a mechanism instead of relying on good intentions of sdes so they don’t build systems that do not scale. But it comes at its own cost. A simple new use case to search or sort on something needs secondary indexes to be added etc. This is the perfect case of over engineering. The worst part is that we have now built a generation of sdes who do not understand the pros and cons between sql and nosql and like to think that anybody who uses sql is a mediocre engineer at best when the truth is that it's them that are low tier engineers who cannot think of tradeoffs properly. Especially this infuriates me when such sdes are in interview loops making bad decisions.

Spotify rbtrohd May 29, 2019

Sounds like you guys should consider having a more structured framework for system design decision. The use case comes first!

IBM dwt May 29, 2019

Oh don’t worry, we have plenty of turds at ibm that think Cloudant and JSON store is the silver bullet for every use case and shun Postgres. Yet they have no data integrity, data model or type checks, so every consumer has to validate fields themselves. So bad.

Facebook blue state May 27, 2019

Fb is mostly just MySQL, bite us.

Amazon frogman47 May 27, 2019

Your cache is fucking huge.

Nvidia jim32 May 29, 2019

Fb is on mysql for historical reasons, and not for its technical abilities.It is mostly used as an extremely simple, sharded and non-acid storage backend with huge caching layer on top of it.

Google blahhalb May 27, 2019

They probably don't know the difference between SQL and nosql. Nosql is useful when you want to scale to very high Rd/wr qps like tens of thousands or if you have lot of data like in TBs. On the contrary, nosql is very painful, they don't have acid properties so you have to handle eventual consistency, moreover complex queries are very painful and require lot of effort. SQL is really great but the problem is of scalability and companies have already solved it. Read about google spanner like global distributed SQL db.

SAP O6Hma5wlq9 May 28, 2019

I support a single relational DB that is 8 TB. That does suck ☹️. Spanner is Google specific with atomic clocks isn't it? Cockroach DB is built from the white papers in it, minus atomic clocks

LinkedIn boringdude May 29, 2019

You can have a distributed key value store with acid properties using locks at the key level . I don’t see why that’s not possible per say?

LinkedIn jsnewbie May 29, 2019

It's all about the workload stupid