Been working as Data engineer from past 5 years, never used Data structures in my projects. Many of interviews has data structures as 1st round of screening. Does it really make sense? Is data structures really required for data engineer? Edit 1: we have many distributed processing framework like spark which process data in different nodes together. I assume using python legacy data structures into the code, pushes all data into single driver.. Yes we do use list, tupples, dictionary in python wherever it's required. To store minimal information. We never use python legacy code to read write or to process data. It's pretty outdated right?? #data #dataanalytics
Do you mean Leetcode type questions?
Not leetcode part, but there are lots of data structures specific questions .. including calculating time and space complexity .
Data engineering interviews vary a lot. It’s a bit hard to predict and I generally do most medium leetcode questions to be prepared. Sometimes I’ve been asked hard LC as well. Sometimes I’ve just been asked SQL
It used to be never useful earlier but, now data engineering is using software engineering principles, concepts. OOPs is there for python and scala. So data structures make sense to be something that DEs need to know.
That's interesting. I thought data engineers wrote code? In any case, data structures are not that serious. https://www.youtube.com/watch?v=RBSGKlAvoiM&ab_channel=freeCodeCamp.org up to 2 hours. I would also recommend some OOP as well, because you can create "custom data structures" out of it. But the standard data structures are the building blocks.
You write code but it's usually libraries that abstract the details of python away.and very heavy on SQL.
Ask relevant questions and not hypothetical BS if you want a good candidate not textbook new grads
I use data structures constantly as a data engineer. If you’re touching libraries to pull APIs then you’re using data structures too... some API extractions have to be built from scratch so it’s helpful to understand...
This makes sense, Thanks xbSI00
From a reliability perspective, why are you coding up data structures yourself?
How come you never used any Data Structure? Not even Arrays, HashMaps and the other basic ones?
There are some data engineers who just use sql and spark, where those structures are internally built.
Copying below from my edited question: We have many distributed processing framework like spark which process data in different nodes together. I assume using python legacy data structures into the code, pushes all data into single driver.. Yes we do use list, tupples, dictionary in python wherever it's required. To store very minimal information. We never use python legacy code to read write or to process data. It's pretty outdated right??
I have the same experience
I have been a data enginner for almost 4 years, I've never needed anything more than a hash map, list and strings.
They both incorporate the word “Data”. Definitely important
But why?
Because of the words