kuberkumar07 / Data_Extraction

This repository showcases my data science expertise: extracting Excel data from URLs, cleaning with pandas, storing in PostgreSQL using Python (requests, pandas, psycopg2, SQLAlchemy). Ideal for demonstrating professional data handling skills.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Title: Data Extraction, Cleaning, and Storage in PostgreSQL

Description: This repository showcases a Python-based solution for extracting data from Excel files hosted online, cleaning it, and storing it into PostgreSQL databases. It includes functionalities for robust data handling using pandas, secure data extraction with requests, and efficient database operations via psycopg2 and SQLAlchemy. Ideal for data scientists and analysts looking to automate data pipelines from diverse sources into a structured database format.

Key Features:

Data Extraction: Retrieve Excel data from any accessible URL. Data Cleaning: Remove duplicates, handle missing values, and format numeric data. Database Integration: Seamlessly store cleaned data into PostgreSQL databases. Scalability: Supports multiple Excel sheets, customizable table names, and dynamic data handling. Technologies Used: Python, pandas, requests, psycopg2, SQLAlchemy, PostgreSQL.

Usage: Clone the repository, provide your Excel URL, and execute main.py to automatically clean and store data.

About

This repository showcases my data science expertise: extracting Excel data from URLs, cleaning with pandas, storing in PostgreSQL using Python (requests, pandas, psycopg2, SQLAlchemy). Ideal for demonstrating professional data handling skills.


Languages

Language:Python 100.0%