This reposotory contains a simple coding to study Spark.
- PySpark is the Python API for Spark.
- Read a CSV file and convert columns to rows; simulating a columnar table.
- Removing duplicates based on
id
and last updateupdate_date
. - Mapping the data type based on a JSON configuration file.
Suprise yourself viewing project.py.
- Install PySpark: Spark, PySpark, and so on.
#-- run
$ python project.py
- License MIT
- Create by Leonardo Mauro ~ leomaurodesenv