jwittbold's repositories
gdelt-gkg-databricks
ETL Pipeline to ingest and transform GDELT GKG 2.0 records
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Language:PythonApache-2.0000
Language:Python000
airflow-log-analyzer
A simple python script to analyze and return errors within Airflow log files.
Language:Python000
hadoop_streaming_mapreduce
Basic Hadoop Streaming MapReduce project
Language:Python000
hdinsight-spark-miniproject
Deploying HDInsight Spark Cluster on Azure
Language:Jupyter Notebook000
mini_pipeline
Python ETL script.
Language:Python000
riskybank
A simple mock ATM/Banking program
Language:PythonBSD-2-Clause000
spring_capital
ETL pipeline for stock data using Spark on Azure
Language:Python000
autoinc-hdfs-spark
Refactoring a MapReduce project to utilize Spark on HDFS.
Language:Python000
azure-data-factory-ELT
Working with Azure Data Factory ELT
000
dsc_intro
Exercises to accompany the free Springboard introductory data science "taster" course.
000
euro_cup_2016_mini_project
SQL queries for euro_cup_2016 mini project
000
gdelt
Exploring GDELT
Language:Jupyter Notebook000
Language:Python000
spark-optimization
Optimizing a Spark SQL query
Language:Jupyter Notebook000
SQL_optimization_mini_project
Optimization of six SQL queries
000