Misc.Apr 3, 2019
AmazonRJCMPR

Hadoop / Hive : doubt

I'm new to hive / hadoop and came to know that hive is schema on read. Which means you don't require to define schema while writing data into hdfs. I have a query here. Once mass data is built into hdfs, after that we need to apply some hive queries to read the data. And when reading, we should have defined schema already. So we come up with the design and build schema and put into metastore, create tables etc... Etc... So in future we are gonna write data again into the same table, database inside hdfs. Isnt it like NOW WE ARE WRITING THE DATA AGAINST SCHEMA? so, will it still hold to be schema on read? Or schema on write also????? Pls add points if I'm missing something here. Note : 1.Basically, first time I didn't use hive ql to store data, just copy paste into hdfs. 2. future ly I'm stuffing data into hdfs using hive ql... P. S : I'm not getting answer to these on Google, sof or quora.. Plz help :)

Flagged by the community.
Bank of America pOlF30 Apr 3, 2019

Op any idea on how to set connection pooling

Bank of America 7snowy Apr 3, 2019

Not sure, but here's my two cents. With hive (schema on read), it means it'll allow you to write whatever you want, without checking against the schema. But when you try to read that data, that is when you'll see that the data got corrupted on entry. Does this help?

Amazon abguru Apr 3, 2019

It cannot be called schema on write as hdfs is just a file-based storage and you can write whatever you want to. Yes, writing data into hdfs using hive ql is an accepted industry standard as long as you dont screw up existing schema