Need some advice in preparing for sr ds interview. Do people have suggested sites for practicing data manipulation using python? Do I need to leecode? I feel that leecode is for sde roles but would like to hear more. What about preparation for case studies? Recruiter’s email included McKinsey case practice but is that really applicable for DS roles?
I've done 7 interviews over the last 2 months. Mid to senior levels. Here is my recommendations: 1- Python: get comfortable with pandas and numpy (e.g. how to choose all rows that contain a null value for any column, inter-quantile range from a list, etc...) and some easy questions on Leetcode on Lists, Strings, Matrix and Math. 2- SQL: Leetcode has some easy/medium/hard questions. Not too many. Solve them and you'll be fine. Try to be comfortable with Window functions and Self-Join problems. (Also, leetcode's MySQL is old and doesn't support Window Functions, so try to practice those using Mode Analytics tutorials) 3- Probability? It's good to know but no one asked Probability puzzles other than Clover Health. Know some probability distributions though. I got asked to describe probability distributions other than Normal. 4- Stats: Know about A/B testing. How to design an experiment and what are the key decisions. This is quite popular. For example, you might be asked to design an experiment: talk about how you'd split observational units into control/treatment, the metrics that you'll track, the effects that you might observe, what could go wrong and finally some questions about how you'd choose the sample size and for how long you'd run an experiment. 5- Product Questions: Google some product management questions related to metrics design, think of metrics applicable to the company you apply for, how you'd measure those metrics, are they long-term or short-term? If long-term, how would you track them in a short time? could you come up with short-term proxy metrics that will reflect the long term metric you're interested in? I recommend getting the product questions book from "datamasked" - the bundle is expensive but if you email the author may be he'll sell you the book alone. It really helped in my case. 6- Machine Learning algorithms: Logistic Regression, Linear Regression and Random Forests. And popular metrics to use and when to use them. Imbalanced data: How to deal with it and what decisions to make?
This is awesome, I’m noting them down, thank you!
The only thing you need to know is the assumptions of linear regression. Every interview is that question so don't forget what homoskedacity is....
Only?
80% of the time