There are 13 repositories under datacleaning topic.
OpenRefine is a free, open source power tool for working with messy data and improving it
Always know what to expect from your data.
A full pipeline AutoML tool for tabular data
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text analysis, data analysis and data visualization
This repository contains data and code used to get and clean data from https://github.com/CSSEGISandData/COVID-19 and https://www.worldometers.info/coronavirus/
An open-source package for python to clean raw text data
Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.
data and code for scrapping and cleaning data on covid-19 in India from https://www.mohfw.gov.in/ and https://www.covid19india.org/
Benchmark for bi-level optimization solvers
对通达信数据进行去重和清洗处理,并将数据存入MongoDB,方便往后研究
This repo contains 4 different projects. Built various machine learning models for Kaggle competitions. Also carried out Exploratory Data Analysis, Data Cleaning, Data Visualization, Data Munging, Feature Selection etc
Predicts home prices of Bangalore. Used Flutter, Flask and Jupyter Notebook.
distill large scale web page text
This project aims to minimize the police response time by detecting weapons through a live CCTV camera feed. So it alerts the police as soon as it detects any sort of weapons. In our project we are focusing on guns primarily. 🔫💣💻🎥
Worked on a dataset of high entropy alloys which is used to design materials for additive manufacturing. Being responsible for Performing Data Analysis and constructing Machine learning algorithms, including neural networks, Gradient boosting for carrying predictions useful for advanced material invention.
Examples for Optimus a Data Cleansing Library for Big Data.
Excel Based Projects
Spark-lean, an interactive PySpark-based Data Cleaning Library
⚒️ Data preprocessing is the process of transforming raw data into an understandable format. It is also an important step in data mining as we cannot work with raw data. The quality of the data should be checked before applying machine learning or data mining algorithms
All kaggle datasets and the R codes
A basic machine learning model built in python jupyter notebook to classify whether a set of tweets into two categories: racist/sexist non-racist/sexist.
This repository is for a data analytics project using SQL. The project is about analyzing and getting insights about video games sales, and users and critics reviews.
Extracting lyrics from Genius API and conducting EDA and NLP analysis on Eminem's lyrics
Resume Screening using Machine Learning and Python
Empowering football analytics through Transfermarkt data crawling, robust database design, and advanced analytics, yielding valuable insights and accurate predictions
Building Big Mart Sales Prediction model
Practical Tasks for get the Data Analyst Associate by Datacamp.
A package to aid with data cleaning using pandas.
This repository is a collection of all the solutions of tasks that were assigned to me during my Data Analytics Virtual Internship Experience Program at Quantium. 💻📚📊