Any good reference materials for building a data pipeline or a platform to support data analysis, aggregation ..etc stuff a data engineer or a data platform engineer would typically do using the Apache stack? Any github projects/tech blogs/videos would be a good reference o
Sqoop for extracts, spark for ETL and aggregations.
Yes, many. When I’m more sober, happy to help
sup homie? feeling better now?
Now just painfully hungover on our final company retreat day... Apache spark examples are everywhere, especially when we're (Databricks) involved. There are a variety of other Apache projects that might also be of interest, whether Airflow, NiFi, or various databases. How's your SQL, Python &/or Scala proficiency?
Argo
Can someone put a link to an open source project maybe??
Following