I have been working on creating data pipelines both batch and near real time for most of my 10 year career. I really enjoy building solid production data pipelines at scale. It is an involved engineering effort whose details I don't want to list here for brevity. However, some view us as just 'plumbers' while the data science folks are the 'magicians'. I've seen entire teams of data scientists(no engineering team members) getting restructured after realizing they've only been doing POCs and projecting ROI for years and nothing productionised. Can someone who built data teams and hired both data engineers/scientists comment on the topic please.
Been wanting someone to post this OP for while
DE is extremely underrated, especially with some of the large production pipelines/platforms/infrastructure at scale at some of the bigger tech companies
That's sounds too bad, by big tech, you mean the likes of Google underrate folks who built their huge pipelines too?
Any SWE worth his salt can do the work DEs do, same for most DS work which is just using pandas or ggplot for analyzing data and making pretty visualizations in Jupyter notebooks, so I don't think they are underrated as they are just a subset of what SWEs can do.
This^. I know a lot of DE who would could easily slot into SWE......and a lot that couldn’t.
Any decent DE is a SWE / programmer, not your drag and drop ETL tool user
DEs get nowhere near the respect, comp or mobility opportunities thst SWEs do. They get requests to build logging specs, pipelines and data sets, cuts or reports to support data science. Since DE is a very narrow skill at FB you have a lot more freedom at Lyft, Airbnb as a DE.
Data science is a meme and data engineering has been around since forever.
A lot of data scientists are just analysts who did a course in 🐍 and tensorflow/Keras and now use pandas and numpy ( and keras) to do some simple stuff . Non drag and drop DE is definitely harder and more interesting
DE is good for system architecture experience.
Since people in the comments mentioned drag and drop, is there a company working on automating reports and stuff. I mean just like we have CI/CD pipelines, why not similar for data? Sorry if this sounds stupid, I am not a DE.
Looking at these replies, I wonder how much ppl think of technical skills alone!!! As if DS is only about pandas/numpy !! This are just tools and what they do is beyond them. I am a DE .... yes I only use SQL +python but I use them probably 25% of time. Other 75% goes in other stuff like meetings, brainstorming, design etc Design/architecting is as important as coding. I wonder how everyone concludes the complexity of DE / DS work just by looking at technology used!!
I think it's time to throw in some jargon here, to signify what goes into DE, its definitely not just the tools and their nuances. If your companies' DE doesn't deal with most of these, that already explains why they are treated low - Schema design, self describing/ compression formats, Schema evolution, data drift, quality/Canary checks, working around cloud objects stores' limitations for big data if your use one, data skew, partitioning, distributed processing, distributed storage...
Well said, problem is that it is not well marketed ... ppl just look at technology (SQL and python) to judge the complexity.
I'm not sure this is true across the board in the industry. It depends greatly on the company. I worked at an adtech firm at one point where they greatly respected data engineers, and briefly at a bank where it was clear that data scientists called the shots. In the case of the latter, it was clear that none of the data scientists knew anything about putting systems into production. I wonder what will happen to them in a few years...