akarce

Cey's repositories

e2e-structured-streaming

End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.

Language:Python16 20

RedditDataPipeline

Data Engineering with Reddit Api, Airflow, Hive, Postgres, MinIO, Nifi, Trino, Tableau and Superset

Language:Python5 20

Udacity-Data-Pipeline-with-Airflow

Udacity Data Engineering Nanodegree Program, Data Pipeline with Airflow project using MinIO and Postgresql.

Language:Python5 10

real-time-data-pipeline-kafka-mongo-elasticsearch-pyspark

A real-time data pipeline project using Kafka, MongoDB, Elasticsearch, and PySpark. Streams raw data from Kafka, enriches it with sentiment analysis using Hugging Face models, stores results in MongoDB, and visualizes data in Elasticsearch with Kibana. Scalable solution for real-time data analytics and machine learning.

Language:Jupyter Notebook400

elk-stack-mastery

A comprehensive project focusing on setting up and configuring the Elastic Stack (Elasticsearch, Logstash, and Kibana) for efficient log management and analytics. This project includes Elasticsearch configurations, Logstash pipelines, and Kibana visualizations, with detailed step-by-step documentation.

3 20

e2e-otp-pipeline

End to End OTP Pipeline Project using Docker, Airflow, Kafka, KafkaUI, Cassandra, MongoDB, EmailOperator, SlackWebhookOperator and DiscordWebhookOperator

Language:Python2 10

akarce

My Personal Information Repository

010