Software Engineer-Data Engineer (divithraju)

divithraju

User data from Github https://github.com/divithraju

Company:Freelancer

Location:India

Home Page:https://linktr.ee/divithraju

GitHub:@divithraju

Software Engineer-Data Engineer's repositories

divith-raju-Immigration-Data-Engineering

A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)

Language:Jupyter NotebookStargazers:3Issues:1Issues:0

divith-raju-OpenMetadata

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language:PythonStargazers:3Issues:1Issues:0

divith-aju-Hadoop-Pyspark-pipeline

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

Language:PythonStargazers:2Issues:1Issues:0

divith-raju-Building-Big-Data-Infrastucture-NoSQL-And-SQL

Big Data Platform on MongoDB Atlas and Heroku PostgreSQL

Language:PythonLicense:MITStargazers:2Issues:1Issues:0

divith-raju-postgreSQL

Implementing PostgresSQL best practices for Data Engineer

License:MITStargazers:2Issues:0Issues:0

Divithraju

Config files for my GitHub profile.

awesome-spark

A curated list of awesome Apache Spark packages and resources.

Language:PythonLicense:CC0-1.0Stargazers:1Issues:0Issues:0

datacompy

Pandas, Polars, and Spark DataFrame comparison for humans and more!

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

divith-raju-big-data-projects

divith-raju-big-data-tools

Language:PythonStargazers:1Issues:1Issues:0

divith-raju-Customer-Sales-ETL-Pipeline

This ETL project was designed to demonstrate the development of a scalable data pipeline for customer sales analysis. It covers all essential steps, from data extraction to transformation and loading into a database, with Apache Airflow used.

Language:PythonStargazers:1Issues:1Issues:0

divith-raju-Data-Mining

This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.

Language:PythonStargazers:1Issues:1Issues:0

divith-raju-ETL-Airflow-Project

This ETL pipeline project is a practical demonstration of my skills in data engineering and automation using Python and Apache Airflow. By integrating MySQL for data storage and leveraging Airflow for task orchestration, the project simulates a scalable and modular ETL solution often required in enterprise data workflows.

Language:PythonStargazers:1Issues:1Issues:0

divith-raju-pipeline-hadoop-pyspark

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

Language:PythonStargazers:1Issues:1Issues:0

divith-raju-Python

This repository highlights my ability to develop and integrate diverse Python solutions, ranging from API creation and data management to cloud service integration. Each project in this repository serves a specific purpose, demonstrating both fundamental concepts and practical applications that are essential in real-world software development.

Language:PythonStargazers:1Issues:1Issues:0

divith-raju-Webapplication-Spark-memory-cal

The Spark Memory Configuration Calculator is designed to help data engineers and Spark developers quickly determine the optimal memory and core configurations for their Spark clusters. With this tool, you can avoid common pitfalls and ensure your cluster resources are used efficiently, leading to better performance and lower costs.

Language:PythonStargazers:1Issues:1Issues:0
Language:PythonLicense:MITStargazers:1Issues:0Issues:0
Language:PythonStargazers:1Issues:1Issues:0

pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.

Language:PythonStargazers:1Issues:0Issues:0

pyspark-examples

Pyspark RDD, DataFrame and Dataset Examples in Python language

Language:PythonStargazers:1Issues:0Issues:0