Cey's repositories
e2e-structured-streaming
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
RedditDataPipeline
Data Engineering with Reddit Api, Airflow, Hive, Postgres, MinIO, Nifi, Trino, Tableau and Superset
Udacity-Data-Pipeline-with-Airflow
Udacity Data Engineering Nanodegree Program, Data Pipeline with Airflow project using MinIO and Postgresql.
real-time-data-pipeline-kafka-mongo-elasticsearch-pyspark
A real-time data pipeline project using Kafka, MongoDB, Elasticsearch, and PySpark. Streams raw data from Kafka, enriches it with sentiment analysis using Hugging Face models, stores results in MongoDB, and visualizes data in Elasticsearch with Kibana. Scalable solution for real-time data analytics and machine learning.
elk-stack-mastery
A comprehensive project focusing on setting up and configuring the Elastic Stack (Elasticsearch, Logstash, and Kibana) for efficient log management and analytics. This project includes Elasticsearch configurations, Logstash pipelines, and Kibana visualizations, with detailed step-by-step documentation.
e2e-otp-pipeline
End to End OTP Pipeline Project using Docker, Airflow, Kafka, KafkaUI, Cassandra, MongoDB, EmailOperator, SlackWebhookOperator and DiscordWebhookOperator