kanchantewary / learn-pyspark

Apache Spark learning notes and examples using Python 3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

learning pyspark

Books

a) Spark Definitive Guide

b) High Performance Spark

c) Learning Spark

d) PySpark Cookbook

Certification

DataBricks Certified Developer Note: Databricks says, this Certification will no longer be available after 31 Oct 2019.
DataBricks Certified Associate This is coming soon,as per portal.

Git Repositories,books

Spark Internals by Jerry Lead
gitbook by Jacek Laskowski
another gitbook
advices on certification
Tutorials by Mahmoud Parsian Talks by Daniel Abadi

RDD

See RDD notes
See A primer on Lambda

Dataframes

See Dataframe notes

Spark Internals, architecture, tuning

See architecture

Spark SQL

See spark-sql

Spark Streaming

See spark-streaming

GraphX

Machine Learning

Machine Learning - Feature Engineering

Scala

courses

Python

Other resources

Sequence file
hdfs
External spark packages

TPC-DS Benchmarking

Blogs: http://blog.madhukaraphatak.com/

http://www.cs.sfu.ca/CourseCentral/732/ggbaker/content/spark.html

ibm cloud resources

https://console.bluemix.net/docs/services/AnalyticsforApacheSpark/using_spark-submit.html

#running-a-spark-application-using-the-spark-submit-sh-script https://developer.ibm.com/clouddataservices/docs/analytics-engine/get-started/

Questions/Comments

View my LinkedIn Profile

Please send me email at: kanchan.tewary@gmail.com

About

Apache Spark learning notes and examples using Python 3

License:MIT License


Languages

Language:Python 87.8%Language:Shell 11.4%Language:Dockerfile 0.3%Language:Roff 0.3%Language:HTML 0.2%