This repository contains PySpark examples.
-
This document is a step by step guide to Spark Installation on "Windows".
-
-
This example explains various methods on Data Worngling on PySpark Dataframes. [Source Code]
- Load the data[iris.csv]
- Display Dataframe's Columns
- Count of of Dataframe's Columns
- Count of Dataframe's rows
- Rename Columns
- Cell Selction
- Column Selction
- PythonStyleQuery
- Dropping Columns
- Dropping off Rows (on condition)
- Performing SQL Queries
- Aggregate Methods
- Sorting
- Describe the Data
- Talking about Missing Values - Replace with Mean
- Conversion: Spark Dataframes --> Pandas Dataframes
- Conversion: Pandas Dataframes --> Spark Dataframes
-
This example explains various methods on Data Wrangling on PySpark RDDs. [Source Code]
- Dataframes and RDD
- Conversion: RDD --> Dataframe
- Conversion: Dataframe --> RDD
- Load the Data[captains_ODI.csv]
-
This document is a step by step guide to show PySpark and Hive integration on Azure HDInsight.