vigneshSs-07 / Complete-AtoZ-Pyspark

This repo explains pyspark modules in python. Used to deal with big data more practical handson.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spark_Pyspark

Apache Spark using Python

  1. https://github.com/dgadiraju/itversity-books/tree/master/Data%20Engineering%20Bootcamp/46%20Apache%20Spark%20using%20Python

  2. https://github.com/dgadiraju/itversity-books/tree/master/starterkits/spark/python

  3. A quick introduction to the Spark API https://lnkd.in/g8Y3tdhX

  4. Overview of Spark - RDD, accumulators, broadcast variable https://lnkd.in/g7fepuFF

  5. Spark SQL, Datasets, and DataFrames: https://lnkd.in/g3iZp7zk

  6. PySpark - Processing data with Spark in Python https://lnkd.in/gBnh6PAi

  7. Processing data with SQL on the command line https://lnkd.in/ggnxDaUu

  8. Cluster Overview https://lnkd.in/guCQnJnv

  9. Packaging and deploying applications https://lnkd.in/gUZpi2P9

  10. Customize Spark via its configuration system https://lnkd.in/gZh8Vkmv

  11. Monitoring - Track the behavior of your applications https://lnkd.in/grpGKFuP

  12. Best practices to optimize performance and memory use https://lnkd.in/gTRYBDQu

About

This repo explains pyspark modules in python. Used to deal with big data more practical handson.


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%