I am a senior Data Engineer. Recently got rejected at Apple’s onsite as could not solve two leetcode problems. Other interviewers were impressed by my Big Data skills( at least from their body language) as that’s my day-to-day job. So for FAANG Data Engineering is Leetcode a must no matter how good you are in coding for Data Pipeline problem using Spark/Airflow/Kafka/EMR/Lambda etc ? Asking as I have a 14 hrs job which I enjoy a lot as I like to solve meaningful data analytics problems and to build reusable framework for ETL but are those skills of lesser value than leetcode ? #dataengineering #onsite #Data #Hadoop
@OP: Can you share how to prep for Big Data skills? For a junior level/ new grad role. Any books/blogs/projects to follow to learn about how serious big data technologies come together
Read chapter 4-11 of this book : http://shop.oreilly.com/product/0636920032175.do and understand each word.
Awesome !! Thanks for the advice. I heard about this book somewhere else also and currently reading it
Hi Sparkbhai. Glad you started this thread. I always wonder how and why would Data structures and algos play a crucial role in hiring decisions made for a Data engineer. Any explanation as to why they are important and how it is useful in a Data engineers life would be highly appreciated. Right now I’m just not convinced to take the extra effort and study them coz I don’t have an explanation
Well, the hiring manager was one of the most impressive gentleman I have ever seen and I asked him this same question. His answer was : The LC test is being done as those are ‘engineering’ problem. In reality the sense that I got is : The hiring manager’s manager is an ‘almost’ recent CS graduate and the group still suffers from his lack of understanding of what DE is and this ‘Boss’ guy throws LC medium without any context. To be honest: There were three or four guys who understand importance of Automation/Right Apache Tools for right job/building scalable/fault tolerant distributed systems and I clicked with one of them as we both built similar frameworks... But to sum it up : it felt this group is suffering from Boss’s inferiority complex of “ I am no way lesser than a Software Engineer” That brings me to the important question : Is it same LC whiteboard for any FAANG Data Engineering group or some groups actually hires people for pure DE and only DE skill set ?
Well, it sounds like you’ve already made up your mind that teams who ask algorithm and data structure questions have an “inferiority complex”. You should be aware that companies that ask these kinds of coding questions are NOT looking for “pure DE” skill set. They want smart, well-rounded engineers. If you intend to only stay within your domain for the rest if your career, then you should apply somewhere else.
Following this
How is data engineers write code right? Why is leetcode not relevant for them?
Yes As A DE I do write code , but the code has nothing to do with a O(n) trick of running multiple for loops and has everything to do to create a Framework where various tools of DE ranging from Apache Spark to Docker fit together to accomplish, IMHO, the most important aspect of DE : to provide reliable, low latency , meaningful and easily accessible DATA! I do not reinvent the wheel to reimplement Spark/Project Tungsten’s Sort-merge join. However, I believe FAANG Data Engineers are doing that and we mere mortals have to look at open source tools what they can achieve by using LC level algorithms. So the question is : The False Positive argument for LC whiteboarding of SWE might relevant here also for ‘all’ the FAANG Data Engineer positions or there are certain groups who don’t want to test LC ?” Any answer will help.
You still want to write efficient code that scales though. I'll agree LC may be overkill but I wouldnt say it's completely irrelevant. I think a good balance of LC and direct data engineering questions would be ideal.
i do data engineering and i have to write complex algorithms to make it work at the scale i am working. wouldnt expect anything less in other big companies
@7331er. Thanks . Could you please put a little more detail on why you have to write complex algo for data which can be either structured/semi-structured and streaming/at rest and we do have different tools/API’s available for them. Just want to understand what usecases am I missing in my daily job and shall set the expectations accordingly.
business usecases. think about graph usecases in google search or uber map analysis or facebook network. ur everyday tools are not gonna cut it for that scale. u need to implement sophisticated graph algos. at the very least need to store the graphs in memory and so need some good algo/ds knowledge
That's the grind. Just accept it.
What leetcode difficulty were they?
TLDR I am not the best person to ask as 1) NDA 2) I got the time/chance to practice around 5-6 LC medium String/Array questions exactly for 2 days and for 2 hours each day. When the really nice hiring manager contacted me I told him that I am not in the job market and never even tried LC. He told me to practice some medium before telephonic. However as expected I bombed the LC medium in telephonic :-) However, he arranged for another telephonic and this one was more on optimizing Spark jobs, big data problems and as expected I aced it :-) So, I got the onsite and all through the time I kept on telling the nice guy that with my workload LC practice is impossible and he kind of hinted they just want to see if I can write basic code and stress is more on in-depth Spark/Yarn etc. In reality the nice guy’s boss started to throw LC at first chance :-) and then came another one who looked like ‘that resident A-hole’ found in every team who threw another one which I could have solved but he interrupted/stopped to preach his genius solution All the other rounds were really enjoyable that went from using Apache tool to Data Problem to Pipeline design but in the end LC always wins :-(
@OP sorry for bothering you again but how do you prepare for Big Data problems? What kind of problems are asked? Like how to take a data problem and design a pipeline for that with all tools involved. Is there any good resource for specifically solving these kind of interview questions