Evan Saju Mathew's repositories
Data-Analysis-Projects
This repository hosts multiple data analysis projects, showcasing a variety of real-time and batch processing pipelines. Each project highlights different tools and technologies, offering comprehensive solutions for data streaming, storage, and visualization.
Reddit_ETL_DE
This project demonstrates a complete data pipeline for extracting, transforming, and loading (ETL) Reddit data into an Amazon Redshift data warehouse. The pipeline uses various AWS services and tools including Apache Airflow, PostgreSQL, AWS S3, AWS Glue, AWS Athena, and Amazon Redshift. The project is orchestrated using Docker and Apache Airflow
SQL-50-Leetcode-Problems
The SQL 50 collection on LeetCode offers a diverse set of problems aimed at evaluating and improving your SQL skills. It covers a broad spectrum of concepts, including fundamental queries, subqueries, joins, aggregations, window functions, and other advanced techniques.
Apache-Kafka-Kraft-and-Apache-Druid
Integrated Apache Kafka (KRaft mode) with Apache Druid for real-time streaming and high-performance analytics.
e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
ETL-University-Course-Extraction-Using-Spark-Snowflake
This project automates the extraction of university course details (e.g., schedules, professors, course codes) from text files using Regex pattern and SpaCy NLP Model and , processes them using PySpark, and loads the structured data into Snowflake for easy querying. The entire pipeline is containerized with Docker
euro-2024-kafka-pinot-pipeline
This project implements a real-time data pipeline for EURO 2024 football data, utilizing Apache Kafka for streaming, Apache Pinot for fast querying, and Apache Superset for data visualization. The pipeline extracts data from a JSON-based API and orchestrates the workflow using Apache Airflow.
netflix_sql_data_analysis
This project explores the Netflix dataset using SQL to answer complex analytical questions. It involves data cleansing, aggregation, ranking, and advanced SQL techniques to uncover insights such as top-performing directors by genre, content diversity by country, yearly content trends, and more.
Northwind-Traders
SQL-powered analysis of sales, employee performance, and customer behavior using PostgreSQL window functions. This project uncovers key business insights to optimize decision-making.
PAT-GFG-DA-1-evanptc-gmail.com
In This Repo, it contains code for Data Analysis / Business Analyst SET -A by Geeks for Geeks PTA
PAT-GFG-DA-2-evanptc-gmail.com
In This Repo, it contains code for Data Analysis / Business Analyst SET -B by Geeks for Geeks PAT
py
Repository to store sample python programs for python learning