James Mwangi's repositories
weather_data_pipeline
This is a PySpark-based data pipeline that fetches weather data for a few cities, performs some basic processing and transformation on the data, and then writes the processed data to a Google Cloud Storage bucket and a BigQuery table.The data is then viewed in a looker dashboard
event-driven-microservices
This project demonstrates an event-driven microservices architecture using Apache Kafka for event streaming and webhook integration with external services
python_etl
Using python-sql to create ETL between mysql and postgresql and windows scheduler to automate the job.
python-kafka_distributed_task_queue
a simple implementation of a distributed task queue
python_tel_chatbot
python telegram chatbot using telegram API
awesome-opensource-data-engineering
An Awesome List of Open-Source Data Engineering Projects
coingecko-streamapp
a streaming app and a dashboard for visualizing cryptocurrency data fetched from the CoinGecko API. The streaming app retrieves real-time cryptocurrency information using Spark Streaming and stores it in a PostgreSQL database.
data-diff
Compare tables within or across databases
dataquest_DE_learningpath
code from my data engineering learning path by dataquest
dta_warehouse_example
using mysql and talend open studio to perform ETL
dta_warehouse_hive
A data warehouse implementation in hive.
mentalhealth_analysis-data-pipeline
An end to end data pipeline for for mental health analysis
mysql_gcp
using airflow to extract data from mysql transform and load into bigquery
podcasts_pipeline
Building a four-step data pipeline using Airflow to download podcast episodes.
Prefect-PostgreSQL-Sensors
The prefect_postgres_sensors package provides Prefect sensors for monitoring changes or conditions within a PostgreSQL database.
pyspark_optimization
using cache/persit methods to optimize pyspark and Pyspark/SQL to query mysql database
python_tweepy
Using python and tweepy to followback friends on twitter. This task uses the windows scheduler to follow back every 5 minutes
tweepy_airflow
airflow dag that shows twitter trending hashtags every 20 mins
MapReduce
mapreduce techniques in hadoop-joins, job counters, inputs/outputs
ploomber
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
python_flask
python-flask basics
R_examples
R- data science exercises and examples
scrape_selenium
twitter automation with selenium
shell_
first attempt at windows task scheduling
soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
speeddating_R
supervised learning in R
weatherbot
weatherbot -using weather map API and telegram API