Hello, I work in the Salesforce ecosystem dealing with large sets of data with a new passion for Python. Any modules/libraries you recommend to take a look at? many thanks!
Pandas n numpy
Pandas is amazing. Has great support. You should also look into using Parquet to store and read from datasets. If you’re playing around you can take a look at DataBricks - it’s free tier allows you to setup a notebook with pySpark already set up
Pandas is great, pyspark if dealing with massive data sets