So I have the following four points on my resume but I feel like it somewhat understates the work I did. I'm going to write descriptions on what I actually did for each of these and I need some advice on if my wording is effective or how I can make them better. Btw I'm not done with the internship but I'm still planning it out. • Built a machine learning model to predict the cost of warranty claims on machines (value statement TBD) I'm going to use a Boosting algorithmn or maybe Random Forest for regression. Probably going to mostly use AutoML and maybe sklearn idk. I think this bullet point is fine after I add a value statement. • Utilized PySpark to combine and preprocess over 100 million rows of claims data in the enterprise data lake So I wasn't just given a training set I had to build the training set from multiple tables/databases. some of these tables were in the EDL on DataBricks and some were on-prem databases like Oracle SQL and a few others. The tables in the EDL had my outputs (warranty claim costs) as well as other tables that I would need to filter out certain irrelevant costs. I wrote a spark.sql (not sure if this is same as PySpark) query in a databricks notebook to combine these EDL tables together and filter out certain unnecessary rows. As part of this big query one of my tables also required the Partitioning Window function. I then used some PySpark functions to validate the data and make sure the rows are unique so that when I do joins later the data would be formatted in the proper way. • Wrote complex SQL queries to normalize and merge several databases in order to build claims training dataset This bullet point refers to the on-prem databases that contain the information I need for my input data. I wrote SQL queries within each of these database softwares to get the information I needed. I then exported this data into the databricks notebook I was working in and converted them to pandas dataframes. I also converted my warranty dataset from the previous bulletpoint into a pandas dataframe since it was small enough. Finally, I joined all of these pandas datasets together in the notebook. No bullet point as of right now, not sure if needed Now with this dataset I'm using some functions like df.describe() and the df profiling_report to get a better understanding of my overall dataset and outliers, stuff like that. Also, I need to fill in zeroes for all of the costs with null values since it represents machines that don't have claims. I would write something like "Performed EDA on training set in Pandas" but I'm not sure if adding another bullet point for this step is additional fluff. After this, is where I build my model. I just included that as the first bullet point since I think it stands out considering I don't have much other ML experience on my resume. • Developed an automated web crawler in Python that validates SharePoint documents and the contents within them This is another project I did. For example, if a SharePoint website with 4 folders were passed in each of which folder had different format documents (Word, PowerPoint, Excel) stored in them, my web crawler would have to see if the link to the Word document is broken but ALSO see if the links within the Word Documents are broken as well. This script should automatically run like every night or something like that and everytime a link is broken notify someone with an email or log it in a database. Haven't done this project yet but definitely going to use BeautifulSoup and probably other libraries like Python-doxc, etc. I know this is an absurd amount of writing from me and will probably be glanced over but any help is appreciated to make any of these bullet points sound better. I feel like I didn't really cover the fact that I started from scratch and went through the entire data science process. TC:30/hr YOE: 1
Tech Industry
Yesterday
3888
BREAKING: Internal sources confirm another round of layoffs just hit emails at Tesla. For real.
Tech Industry
Yesterday
790
Google unanimous strong hire
Tech Industry
4d
43459
What happens when most of your team is Indian?
Software Engineering Career
Yesterday
3183
L4 Google -> 45 interviews, 5 offers, AMA
Tech Industry
Yesterday
1276
The man I love hates me because I’m Vietnamese
If you put an entire essay into your post, at least make sure to include the TC and YOE or GTFO.
Oops. Didn't know that applied to interns too.