There are 9 repositories under data-preprocessing topic.
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Prepping tables for machine learning
Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning
Machine Learning library for the web and Node.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
A day to day plan for this challenge. Covers both theoritical and practical aspects
convtools is a python library to declaratively define conversions for processing collections, doing complex aggregations and joins.
Jupyter Notebooks and Data Sets for Pandas Library
A simpler way of reading and augmenting image segmentation data into TensorFlow
Social Media Mining Toolkit (SMMT) main repository
The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
SEGAN pytorch implementation https://arxiv.org/abs/1703.09452
I will update this repository to learn Machine learning with python with statistics content and materials
sciblox - Easier Data Science and Machine Learning
A quantitative study on over 1.25 million tweets about ChatGPT, employed data scrapping, data cleaning, EDA, topic modeling, and sentiment analysis.
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.
REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript
Resources of our survey paper "Enabling AI on Edges: Techniques, Applications and Challenges"
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
Movie Recommendation System: Project using R and Machine learning
Data preparation for data science projects.
GWAS summary statistics files QC tool
A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.
Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset
Data stream analytics: Implement online learning methods to address concept drift and model drift in dynamic data streams. Code for the paper entitled "A Multi-Stage Automated Online Network Data Stream Analytics Framework for IIoT Systems" published in IEEE Transactions on Industrial Informatics.