Yusuf Ganiyu (airscholar)

airscholar

Geek Repo

Company:@Orbit-Inc

Location:England, United Kingdom

Home Page:datamasterylab.com

Twitter:@yusufOganiyu

Github PK Tool:Github PK Tool

Yusuf Ganiyu's repositories

e2e-data-engineering

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

RedditDataEngineering

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

Language:PythonStargazers:29Issues:0Issues:0

changecapture-e2e

This project shows how to capture changes from postgres database and stream them into kafka

Language:PythonStargazers:19Issues:0Issues:0

FlinkCommerce

This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres

Language:JavaStargazers:19Issues:2Issues:0

RealtimeStreamingEngineering

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.

Language:PythonStargazers:15Issues:0Issues:0

realtime-voting-data-engineering

This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgres and Streamlit. The system is built using Docker Compose to easily spin up the required services in Docker containers.

modern-data-eng-dbt-databricks-azure

In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.

Stargazers:12Issues:0Issues:0

SparkingFlow

This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.

Language:JavaStargazers:12Issues:0Issues:0

FootballDataEngineering

An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.

Language:PythonStargazers:10Issues:1Issues:0

Kubernetes-For-DataEngineering

This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering environment using Kubernetes and Apache Airflow

Language:PythonStargazers:10Issues:0Issues:0

ApacheFlink-SalesAnalytics

This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.

Language:JavaStargazers:7Issues:0Issues:0

cicd_for_data_engineering

This project showcases how to integrate the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) with the realm of modern data engineering using Terraform and Azure as the case study

Language:HCLStargazers:7Issues:0Issues:0

Japan-visa-data-engineering

This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark clusters are set up within a Docker container on Azure.

Language:HTMLStargazers:7Issues:2Issues:0

YoutubeAnalytics

An end-to-end data engineering pipeline that fetches real-time YouTube analytics and streams them through Kafka for processing with ksqlDB. The processed analytics data is then sent to Telegram for real-time notifications.

Language:HTMLStargazers:5Issues:0Issues:0

AlphaTeam

Complex Network Analysis Using Machine Learning

Language:HTMLStargazers:4Issues:0Issues:0
Language:PythonStargazers:3Issues:0Issues:0

EMR-for-data-engineers

This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.

Language:PythonStargazers:3Issues:0Issues:0

Face-Anonymizer

This repository contains different algorithms and methods to anonymize faces in images by blurring or pixelating them using OpenCV and MTCNN in Python

Language:Jupyter NotebookLicense:MITStargazers:3Issues:1Issues:0
Language:PythonStargazers:2Issues:0Issues:0
Language:TypeScriptStargazers:2Issues:0Issues:0

dbt-bigquery-crash-course

A deep dive into the powerful combination of DBT and BigQuery, the game-changers in modern data engineering.

Stargazers:2Issues:0Issues:0

AI-Workout-Manager

AI Workout Manager is a pose detection application that uses Artificial Intelligence to track body movements, track specific parts of the body and generate performance metrics

Language:PythonStargazers:1Issues:0Issues:0
Language:Jupyter NotebookStargazers:1Issues:0Issues:0
Language:Jupyter NotebookStargazers:1Issues:0Issues:0
Stargazers:0Issues:0Issues:0

docs.nestjs.com

The official documentation https://docs.nestjs.com 📕

Language:TypeScriptLicense:MITStargazers:0Issues:0Issues:0

llama-gpt

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device.

License:MITStargazers:0Issues:0Issues:0

nest

A progressive Node.js framework for building efficient, scalable, and enterprise-grade server-side applications with TypeScript/JavaScript 🚀

Language:TypeScriptLicense:MITStargazers:0Issues:0Issues:0

sqlc

Generate type-safe code from SQL

Language:GoLicense:MITStargazers:0Issues:0Issues:0

typeorm

ORM for TypeScript and JavaScript (ES7, ES6, ES5). Supports MySQL, PostgreSQL, MariaDB, SQLite, MS SQL Server, Oracle, SAP Hana, WebSQL databases. Works in NodeJS, Browser, Ionic, Cordova and Electron platforms.

Language:TypeScriptLicense:MITStargazers:0Issues:0Issues:0