I have been working on creating data pipelines both batch and near real time for most of my 10 year career. I really enjoy building solid production data pipelines at scale. It is an involved engineering effort whose details I don't want to list here for brevity. However, some view us as just 'plumbers' while the data science folks are the 'magicians'. I've seen entire teams of data scientists(no engineering team members) getting restructured after realizing they've only been doing POCs and projecting ROI for years and nothing productionised. Can someone who built data teams and hired both data engineers/scientists comment on the topic please.
Been wanting someone to post this OP for while
DE is extremely underrated, especially with some of the large production pipelines/platforms/infrastructure at scale at some of the bigger tech companies
That's sounds too bad, by big tech, you mean the likes of Google underrate folks who built their huge pipelines too?
Any SWE worth his salt can do the work DEs do, same for most DS work which is just using pandas or ggplot for analyzing data and making pretty visualizations in Jupyter notebooks, so I don't think they are underrated as they are just a subset of what SWEs can do.
This^. I know a lot of DE who would could easily slot into SWE......and a lot that couldn’t.
Any decent DE is a SWE / programmer, not your drag and drop ETL tool user
DEs get nowhere near the respect, comp or mobility opportunities thst SWEs do. They get requests to build logging specs, pipelines and data sets, cuts or reports to support data science. Since DE is a very narrow skill at FB you have a lot more freedom at Lyft, Airbnb as a DE.
How competent one need to survive as DE at FB, I mean does not having good command over programming is alright?
Just advanced SQL and very rudimentary python
Data science is a meme and data engineering has been around since forever.
A lot of data scientists are just analysts who did a course in 🐍 and tensorflow/Keras and now use pandas and numpy ( and keras) to do some simple stuff . Non drag and drop DE is definitely harder and more interesting
DE is good for system architecture experience.
Since people in the comments mentioned drag and drop, is there a company working on automating reports and stuff. I mean just like we have CI/CD pipelines, why not similar for data? Sorry if this sounds stupid, I am not a DE.
Looking at these replies, I wonder how much ppl think of technical skills alone!!! As if DS is only about pandas/numpy !! This are just tools and what they do is beyond them. I am a DE .... yes I only use SQL +python but I use them probably 25% of time. Other 75% goes in other stuff like meetings, brainstorming, design etc Design/architecting is as important as coding. I wonder how everyone concludes the complexity of DE / DS work just by looking at technology used!!
I think it's time to throw in some jargon here, to signify what goes into DE, its definitely not just the tools and their nuances. If your companies' DE doesn't deal with most of these, that already explains why they are treated low - Schema design, self describing/ compression formats, Schema evolution, data drift, quality/Canary checks, working around cloud objects stores' limitations for big data if your use one, data skew, partitioning, distributed processing, distributed storage...
Well said, problem is that it is not well marketed ... ppl just look at technology (SQL and python) to judge the complexity.
Tech Industry
Yesterday
748
Conspiracy theory: Sundar wants to make India great again
Tech Industry
Yesterday
1274
The end of Backdoor Roth?!
India
Yesterday
607
Who are these retards asking for dictatorship in India?
Tech Industry
Yesterday
3693
Asians - what are your thoughts on asian female white male ?
2024 Presidential Election
Yesterday
581
Heartwarming peaceful protests
I'm not sure this is true across the board in the industry. It depends greatly on the company. I worked at an adtech firm at one point where they greatly respected data engineers, and briefly at a bank where it was clear that data scientists called the shots. In the case of the latter, it was clear that none of the data scientists knew anything about putting systems into production. I wonder what will happen to them in a few years...