To quickly get started run this in your terminal
python3 python-pyspark-framework/jobs/sales.py
first install delta-spark via pip3 then in json/sales.json, change
{"config":{"deltalake":false}}
to
{"config":{"deltalake":true}}
I've set this to false to allow you to install Delta Lake first should you want to use it
See jobs/sales.py for an example of how I call the function exportResult() from inside transformData()
def transformData(spark:SparkSession, df:DataFrame) -> DataFrame:
exportResult(spark, [ (dataframe, "table-name") ])
you don't need to call the function as I do. You can call the class method directly with this code
class_pyspark.Sparkclass(config={"export":"/tmp/delta"}).exportDf((dataframe, "table-name"))
Apache Spark (https://spark.apache.org)
Python
sudo pip3 install pyspark
sudo pip3 install delta-spark
Any questions, feedback, requests or suggestions then email me at datyrlab@gmail.com