I don't know much about Spark/PySpark, how can I get some hands on experience with it, without spending too much money (if that's even possible)? #data #dataanalytics #datascience #dataengineering
Add some csv files to blob storage and transform them using a notebook with pyspark. It shouldn’t cost too much money if you are using smaller files (though spark would obviously be overkill).
Watch Spark Summit, Data+AI summit on YouTube.
GCP/AWS Big data courses, Datacamp spark courses
Single node spark is totally available and worth $0 to try. I recommend pyspark for a starting unless you're familiar with scala language. You can install it just `pip3 install pyspark` and run `pyspark`. For first timer I recommend to play around with some datasets from Kaggle.
Databricks training is bad for your purpose. It is designed for certification / vendors. Suggest you get a course (Udemy) or a book (Manning or O'Reilly). Make sure to understand the basics of dataframes, functional programming, mutation, the DAG, schemas, the parquet file format and compression, and SparkContexts. You can start on local machine and build in Jupyter. If you have a Windows machine, Google a guide to set up Hadoop, Java, Spark correctly.
Terrible advice. They'll never have to setup those environments. Databricks employees also author the Udemy courses and the Manning/O'Reilly books, so staying away from Databricks trainings seems like a weird thing to suggest.
...says Databricks.
Open AWS account, open AWS Glue, Create Endpoint and notebook and start writing some basic ETL jobs in Pyspark. That's the best way to learn. Nothing replaces hands-on experience. To know fundamentals, you should read Apache Spark Docs.. Read concepts like Lazy Evaluation, Shuffling, Collect method, partitioning etc. Not sure if spark works on Google Colab. If yes then it's also good option to learn.
DataBricks academy has free courses on spark/pyspark. Very useful and recommended!
Install pyspark on your laptop(watch youtube). Try out the codes on sparkbyexamples.com.
Watch YouTube videos and volunteer
Databricks trainings
Is there a link ?
www.google.com