Tech IndustrySep 29, 2021
FacebookT.Anderson

ML/data stack at Google

Googlers, I left Google couple of years back and considering coming back as a SWE/ML IC. Can you folks give me a sense of how the data/ML stack looks like these days? + data pipeline: Is it still primarily map-reduce/flume? Does map-reduce support Java/python anymore? Is there any traction for using SQL interfaces, something like BigQuery(GCP)/Dremel/HiveQL? + I presume ML pipeline is primarily based on TFX. Do they support trees (traditionally trees have not been popular at Google) Anything else I should be aware of? Obligatory TC: 900K (with appreciation), but let's keep discussion focused on the question, not a silly TC number. #google

Facebook real-eng Sep 29, 2021

Just curious how do you compare FB now to Google when you left?

Google owqJ68 Sep 29, 2021

I'm just starting to run a relatively newly produced (under 2 years I think, or pretty close) ml pipeline at G. It's a reflex pipeline running on flume. There's a lot of python in the harness code. Happy to check more details but I don't have a huge ml background and am not sure how to tell.