divithraju

Software Engineer-Data Engineer's repositories

divith-raju-Immigration-Data-Engineering

A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)

Language:Jupyter Notebook3 10

divith-raju-OpenMetadata

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language:Python3 10

divith-aju-Hadoop-Pyspark-pipeline

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

Language:Python2 10

divith-raju-Building-Big-Data-Infrastucture-NoSQL-And-SQL

Big Data Platform on MongoDB Atlas and Heroku PostgreSQL

Language:PythonMIT2 10

divith-raju-Hadoop-Connectors-Master

Language:Java2 10

divith-raju-postgreSQL

Implementing PostgresSQL best practices for Data Engineer

MIT200

Divithraju

Config files for my GitHub profile.

2 10

awesome-spark

A curated list of awesome Apache Spark packages and resources.

Language:PythonCC0-1.0100

datacompy

Pandas, Polars, and Spark DataFrame comparison for humans and more!

Language:PythonApache-2.0100

divith-raju-Advanced-SQL-Blog

1 10

divith-raju-Big-Data-Blog

1 10

divith-raju-big-data-projects

divith-raju-big-data-tools

Language:Python1 10

divith-raju-Customer-Sales-ETL-Pipeline

This ETL project was designed to demonstrate the development of a scalable data pipeline for customer sales analysis. It covers all essential steps, from data extraction to transformation and loading into a database, with Apache Airflow used.

Language:Python1 10

divith-raju-Data-Mining

This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.

Language:Python1 10

divith-raju-ETL-Airflow-Project

This ETL pipeline project is a practical demonstration of my skills in data engineering and automation using Python and Apache Airflow. By integrating MySQL for data storage and leveraging Airflow for task orchestration, the project simulates a scalable and modular ETL solution often required in enterprise data workflows.

Language:Python1 10

divith-raju-Hadoop-3.3.6-setup-on-Ubuntu

Language:Shell1 10

divith-raju-Hive

1 10

divith-raju-pipeline-hadoop-pyspark

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

Language:Python1 10

divith-raju-PySpark-Projects

Language:Python1 10

divith-raju-Pyspark-work

Language:Python1 10

divith-raju-Pyspark_Auto.Generate

Language:Python1 10

divith-raju-Python

This repository highlights my ability to develop and integrate diverse Python solutions, ranging from API creation and data management to cloud service integration. Each project in this repository serves a specific purpose, demonstrating both fundamental concepts and practical applications that are essential in real-world software development.

Language:Python1 10

divith-raju-Steaming-project-Spark-Kafka-Cassandra

1 10

divith-raju-Sweetviz-Package

NOASSERTION1 10

divith-raju-Webapplication-Spark-memory-cal

The Spark Memory Configuration Calculator is designed to help data engineers and Spark developers quickly determine the optimal memory and core configurations for their Spark clusters. With this tool, you can avoid common pitfalls and ensure your cluster resources are used efficiently, leading to better performance and lower costs.

Language:Python1 10

divithraju

Software Engineer-Data Engineer's repositories

divith-raju-Immigration-Data-Engineering

divith-raju-OpenMetadata

divith-aju-Hadoop-Pyspark-pipeline

divith-raju-Building-Big-Data-Infrastucture-NoSQL-And-SQL

divith-raju-Hadoop-Connectors-Master

divith-raju-postgreSQL

Divithraju

awesome-spark

datacompy

divith-raju-Advanced-SQL-Blog

divith-raju-Big-Data-Blog

divith-raju-big-data-projects

divith-raju-Customer-Sales-ETL-Pipeline

divith-raju-Data-Mining

divith-raju-ETL-Airflow-Project

divith-raju-Hadoop-3.3.6-setup-on-Ubuntu

divith-raju-Hive

divith-raju-pipeline-hadoop-pyspark

divith-raju-PySpark-Projects

divith-raju-Pyspark-work

divith-raju-Pyspark_Auto.Generate

divith-raju-Python

divith-raju-Steaming-project-Spark-Kafka-Cassandra

divith-raju-Sweetviz-Package

divith-raju-Webapplication-Spark-memory-cal

friendlyetl-PKG

Github-bot

pyspark-example-project

pyspark-examples

User_behavior_analytics-