How's the data infrastructure at your company?

No need to go into specifics, just at a high level. Mainly interested in Facebook, Google and Pinterest, but any other internet companies with large amounts of data are free to chime in. Say you want to read some data and gain insights from it. Do you have a central platform where teams can dump their data for everyone in the company to query/process, or are teams responsible for writing and building their own data platforms? Once the data is in the data platform, how is processing done? Are people writing spark apps to read and process the data, or do you have a web UI for that? Is the web UI like a drag and drop thing where you create a DAG of operations, and the processing is done in the background by translating the DAG to spark or Hadoop or something? I'm interested in learning more, because I'm interning in a data heavy team this summer and am interested in how other internet companies approach this problem of making big data available and easy to query and understand. TC: $7725/mo

Add a comment
Salesforce BretTaylor Jul 4, 2018

Additional questions: Do you build your own platform from open source projects, build on top of cloud-specific services (e.g. BigQuery or RedShift) or just buy software from vendors like Databricks? Why?

Amazon thr0waw4y OP Jul 4, 2018

I'd imagine the companies I mentioned above build their own. At least Amazon does.

Salesforce BretTaylor Jul 4, 2018

Amazon owns AWS so I’m guessing retail and everything else can just use AWS services for free?

Hitachi Vantara vataran Jul 4, 2018

Curious as well.