Understanding what a data engineer do?

Question

when they say building data pipelines that means like taking the meta data from websites like how many clicked on the site or how many people stayed on the website for x number of minutes and storing that data in a database? and then spark is used to manage the large amounts of data thats being gathered and then airflow is used to control the timing of when that data is being scraped and cloud services like aws/azure/google cloud is where you store it?

how do you get that meta data? 
isnt that stuff already being stored when the website is being created ? like in a database already? because when the create the website they have to store the person info somewhere to have accounts right?

kQlJ07 · Accepted Answer

Even within the field of data engineering there are many roles but you did get some of them correct. Being a data engineer means you will be doing anything from:

- Real time data collection and processing to insert into DBs and creating data lakes/warehouses
- Setting up data streams to connect producers and consumers 
- Writing transformations and other computations on the data
- Setting up ETL jobs, staging and production tables, managing clusters for your data lake 
- Collecting and vending metrics and monitoring those metrics

As for the metadata stuff, it is not usually stored when the website is created. User clickstreams are not automatically inserted into a database because this would not really be useful without adequate processing first.

A data engineer or other backend eng has to usually define DB schemas for the data, do data clean ups and sanitizing, maybe transformations, etc. such that it will actually have significant value to whoever is querying and using the data. They have to set up low latency methods (data pipelines) of getting data to other people around the company. They have to set up monitoring on their data pipelines to make sure data volume and quality is as expected.

They usually have to make sure the people using their data are able to use it well and are not encountering problems with data quality or processing, etc.

nkKa48 · Answer

Not a question for blind but can't blame you, you're from kpmg

w00fer · Answer

Try googling "data pipelines" and reading about it from any company that has a tech blog. Or Youtubing it if you prefer video.

RollinRick · Answer

Your post sounds like someone threw a buzzword at you and now you have to explain it to someone else.   Here is what you need to read up on:
3 tier web app
OAuth 2.0 workflow and identity providers
Ad pixels

That reading will give you an idea of how website data is handled.

Industries

Job Groups

General Topics

Understanding what a data engineer do?

Sponsored

Most Read

Understanding what a data engineer do?