Jul 4, 2018 5 Comments

No need to go into specifics, just at a high level. Mainly interested in Facebook, Google and Pinterest, but any other internet companies with large amounts of data are free to chime in. Say you want to read some data and gain insights from it. Do you have a central platform where teams can dump their data for everyone in the company to query/process, or are teams responsible for writing and building their own data platforms? Once the data is in the data platform, how is processing done? Are people writing spark apps to read and process the data, or do you have a web UI for that? Is the web UI like a drag and drop thing where you create a DAG of operations, and the processing is done in the background by translating the DAG to spark or Hadoop or something? I'm interested in learning more, because I'm interning in a data heavy team this summer and am interested in how other internet companies approach this problem of making big data available and easy to query and understand.

  • Salesforce BretTaylor
    Additional questions: Do you build your own platform from open source projects, build on top of cloud-specific services (e.g. BigQuery or RedShift) or just buy software from vendors like Databricks? Why?
    • Salesforce BretTaylor
      Amazon owns AWS so I’m guessing retail and everything else can just use AWS services for free?
    • Amazon UnsungZero
      If only
  • Hitachi Vantara / Other vataran
