Joshua Omolewa's repositories
Stock_streaming_pipeline_project
Built a real-time streaming pipeline to extract stock data, using Apache Nifi, Debezium, Kafka, and Spark Streaming. Loaded the transformed data into Glue database and created real-time dashboards using Power BI and Tableau with Athena. The pipeline is orchestrated using Airflow.
Retailstore_ETL_pipeline_project
Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and transforms the raw data (ETL process) using Apache spark to meet business requirements and also enables Data Analyst create Data Visualization using Superset. Airflow is used to orchestrate the pipeline
edmonton_weather_aws_serverless_project
This is an AWS data engineering serverless project to track Edmonton weather in near real time using services like Kinesis Data Firehose, S3, AWS lambda, AWS Glue, Athena, IAM,
Job_API_ETL_datapipeline_project
Building an ETL pipeline using AWS services that extract data from a Job API and then transforms data to meet business requirements and load data to S3 bucket
30-Days-Of-Python
30 days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than100 days, follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw
awesome-interview-questions
:octocat: A curated awesome list of lists of interview questions. Feel free to contribute! :mortar_board:
ci-cd-project-1
Practicing CI/CD using github actions
container-images
Docker images for Debezium. Please log issues in our JIRA at https://issues.redhat.com/projects/DBZ/summary
Covid-19-analysis
Covid 19 Canada data analysis
data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
Data-Engineering-learning
My data engineering practice
data-engineering-practice
Data Engineering Practice Problems
debezium-examples
Examples for running Debezium (Configuration, Docker Compose files etc.)
devops-exercises
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
devops-resources
DevOps resources - Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP
docker_ETL_pipeline_project
ETL project that uses docker container containing a python script to extract the csv data, transform the csv data by combining files into a single file and then load data into an output folder and also ensure the output csv file file is still available even if the container is shutdown.
flink
Apache Flink
Git_practice
I use this repo to practice my git skills
markdown-here
Google Chrome, Firefox, and Thunderbird extension that lets you write email in Markdown and render it before sending.
Miscellaneous
Includes notes on Apache Spark, Spark for Physics, Jupyter notebook examples for Spark, Oracle and other DB systems.
spark-syntax
This is a repo documenting the best practices in PySpark.
sqlfluff
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
system-design-notebook
Learn System Design step by step
tech-interview-handbook
💯 Curated coding interview preparation materials for busy software engineers
Toronto_Climate_API_ETL_project
Built an ETL Pipeline that extract Climate data from API and transform the data by combining all data extracted from API into a single file which is then loaded into an output folder
trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)