simmieyungie / Data-Cleaning

This is a repository containing a wealth of Data Cleaning methodologies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data-Cleaning

This is a repository containing a wealth of Data Cleaning methodologies

Overview

Are you looking to improve your data cleaning skills? This is a project designed to help you master Data Cleaning.

According to a poll, data science professionals say, 80% of their time is spent on data cleaning. There is no one-size-fits-all in cleaning, however, practicing with as many datasets as you can find really sets you up in the right direction.

Of course, data comes in different formats. If you however, practice with as many as possible, you expose yourself to a wide range of manipulation techniques. Learning all the possible tehniques available helps set you with the right ability to deal with any dataset.

Datasets Scripts

Some of the datasets are excel sheets containing the cleaned version and the dirty version. The scripts to clean the data are available in Pyhon and R. If you want to clean the datasets using other languages feel free to do that.

Pull Request

We aim to populate this repository with as many cleaning projects as possible. If you have datasets you have previously cleaned, you're welcome to send a pull request. But ensure the code works and is well documented. A PR of the dataset and the script should be sent and it would merged once properly reviewed. You will be added to the contributors list once your PR has been merged.

Contributors List

About

This is a repository containing a wealth of Data Cleaning methodologies


Languages

Language:Jupyter Notebook 94.8%Language:R 2.9%Language:Python 2.3%