leomaurodesenv / pyspark-study

PySpark study - some challenges and their solutions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PySpark Study

Codacy Badge

This reposotory contains a simple coding to study Spark.

  • PySpark is the Python API for Spark.

Challenges

  1. Read a CSV file and convert columns to rows; simulating a columnar table.
  2. Removing duplicates based on id and last update update_date.
  3. Mapping the data type based on a JSON configuration file.

Coding

Suprise yourself viewing project.py.

#-- run
$ python project.py

Also look ~

About

PySpark study - some challenges and their solutions

License:MIT License


Languages

Language:Python 100.0%