Career prospects for Data Engineering?

Hotwire Victor333
Aug 23 26 Comments

I find data engineering pretty interesting and fun. You don't need to write complex distributed services or web-apps but mostly have to make use of existing data infrastructure and systems to build pipelines. However, it seems to be a relatively new field and maybe the market pay is also less than backed engineer or other software engineering roles - Data Scientist, Machine Learning Engineer, etc.. Is data engineering looked down upon, in comparison to other swe roles? What are the career prospects?

comments

Want to comment? LOG IN or SIGN UP
TOP 26 Comments
  • This comment was deleted by original commenter.

    • Hotwire Victor333
      OP
      Nah, it's much more than babysitting. It's usually ETL - Extract Transform and Load. Each of these phases can be quite complex and if done right, can save millions of dollars. Once the pipeline is built, you can sit back and relax - this can be the beauty or ugly of it - depending on your perspective.
      Aug 23
    • New / R&D LangeSohne
      No...don't group DS and ML engineers/researchers in the same category. The former is basically in the same category as data engineers, which is the point I've been trying to make throughout this thread. There is increasingly no distinction between DS and DE, and the combined role is lower than regular SWE in value contribution, total compensation and career trajectory.

      Most "data scientists" are business analysts who moved on from slinging Excel to using Python and SQL. As the industry matures, they're being expected to know how to stand up the RDBMS or NoSQL warehouse, query it, visualize the results, optimize with indices, etc.

      But data scientists aren't telling leadership WHAT to improve on or HOW. They're taking results from a database, presenting it to leadership and/or the ML team, and then letting the ML team do all the sophisticated work.

      They fall lower in the TC and career growth hierarchy because the research teams identify what needs to be improved and how to do it, the engineering teams implement and optimize the improvements, and the data science teams clean and present all the teams so the research teams can do their work.
      Aug 23
    • Hotwire Victor333
      OP
      What do you mean by a regular SWE? Web apps in Spring? DE requires functional programming and using/knowledge of distributed systems, when done right. What does a typical SWE do? OOP to make mobile apps? APIs? With unsupervised deep learning the MLE role seems redundant. So, what's your point? DE seems to be the only intelligent role of doing something that requires human intelligence and can't be just automated.
      Aug 23
    • New / R&D LangeSohne
      I'm talking about who is usually considered more of a cost center and who is considered closer to value generation. This has nothing to do with intelligence, which is completely orthogonal.

      As an aside, however, it's wildly inaccurate to say that Machine Learning engineers are redundant because unsupervised learning exists. Unsupervised and supervised learning don't both apply to all the same types of problems. And even if they did, you'd want engineers who specialize in improving the unsupervised model and training performance.

      You also might be surprised how much more important it is to most large tech companies to have someone who can design and implement robust mobile APIs. The way you're talking about both of these roles doesn't seem well-informed, to put it frankly.
      Aug 23
    • Hotwire Victor333
      OP
      Hm.. maybe because I haven't worked much in the two roles. But, the most complex systems, very often tend to be distributed systems and data engineering rightfully comes into picture, only when working with data at scale. Now, from my interactions and experience with other two roles, apart from occasional algorithmic complexity, folks in the other two roles seem to be parallelizing or distributing their Java application, at a larger scale, at max. But, as I said, fault tolerant distributed systems are the beginning point for real DE. Any kid can pick up a Java or Eclipse manual and make apps.
      Aug 23
  • SolarWinds AnEngineer
    In some places a DE just works with SQL and some Python and is closer to the Data Science team. At other places (such as my current employer) the Data Engineers are really distributed systems engineers and the position is for the most senior of SWE (with more junior folks working in front and back end of the web apps).

    It's really not a well defined title.
    Aug 23 1
    • Hotwire Victor333
      OP
      Yeah, the role can vary. Concurrent processing on distributed systems at scale with robust fault tolerance is perhaps the juice of it.
      Aug 23
  • LinkedIn / Data kVBR87
    From an outsider view, I feel it would be becoming in high demands. At least 1/2 years ago, I feel ppl are diluted in the bubble of DS, and every company regardless how bad their data pipeline is, they need a data scientist. But more and more are realizing before they could start even thinking about hiring DS or Analytics, they need to consolidate their data pipeline. But the pay is hard to say.
    Aug 23 6
    • LinkedIn / Data kVBR87
      That’s very true. The growth potential is a bit uncertain or even on the negative end.
      Aug 23
    • New / R&D LangeSohne
      This is actually why most data engineers are just getting combined with data scientists.
      Aug 23
    • Hotwire Victor333
      OP
      So is it a good entry point to get into Data Science?
      Aug 23
    • New / R&D LangeSohne
      Yes, but only in the sense that it is increasingly a job function of the singular role that data science is moving towards.

      "Data Science" is a marketing term. Companies with a real need for data-driven product development don't really hire scientists, mathematicians or statisticians to be data scientists. They hire them to be researchers producing novel work and models.

      Data scientists are sold on a jazzy role where they'll do cool data...stuff, but they end up mostly working with pandas, matplotlib and Tableau to visualize trends and clean data for use by actual research teams. Over time they've also been given responsibility for making sure the data pipeline doesn't shit the bed.

      It's basically a specialization of software engineer which mostly works with Python and SQL. It's not really surprising that backend engineers tend to earn more in total comp from FAANG.
      Aug 23
    • New Stunt
      Lange’s comment reminded me of this pic. It’s scary how accurate it actually is.
      Aug 23
  • Salesforce sdes
    I work in data engineering. The real skill is to build a platform that scales and is highly metadata driven. For example I have built platforms with java, python, hive, oozie, spark and Postgres. We have designed each component in such a way that it can be generalized for all pipelines. Once the platform is built it’s all about writing ETL scripts and workflows(oozie,airflow etc)
    Aug 23 0
  • Snapchat jduegwozhf
    Snap got rid of hiring data engineers and requires Data Scientists to build pipelines using platforms that Data Platform SWEs build
    Aug 23 4
    • Hotwire Victor333
      OP
      Why? Did they let go data engineers or asked them to automate their job by building the platform?
      Aug 23
    • New / R&D LangeSohne
      They probably did it because the cost of the two distinct roles didn't justify the amount of work each was doing individually.

      Frankly, this is happening more and more often. A lot of new data scientists think they'll be primarily working on statistics and modeling, but most of their time is usually spent working on data pipeline stuff and cleaning/normalizing data for others.
      Aug 23
    • Snapchat jduegwozhf
      I think they eventually left or got converted to SWE or Data Scientist if they qualified
      Aug 23
    • Hotwire Victor333
      OP
      So, is it like data engineers becoming data scientists or the other way around?
      Aug 23
  • Humana edjuh
    Data Engineering isn’t new. Data has needed to be integrated since data has been being created. There are just a lot of new cool technologies that are overkill for the regular use cases which haven’t changed much
    Aug 23 0

Salary
Comparison

    Real time salary information from verified employees