There are 9 repositories under data-preprocessing topic.
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Prepping tables for machine learning
Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning
Machine Learning library for the web and Node.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.
A day to day plan for this challenge. Covers both theoritical and practical aspects
convtools is a python library to declaratively define conversions for processing collections, doing complex aggregations and joins.
Jupyter Notebooks and Data Sets for Pandas Library
A simpler way of reading and augmenting image segmentation data into TensorFlow
Social Media Mining Toolkit (SMMT) main repository
The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
SEGAN pytorch implementation https://arxiv.org/abs/1703.09452
I will update this repository to learn Machine learning with python with statistics content and materials
sciblox - Easier Data Science and Machine Learning
A quantitative study on over 1.25 million tweets about ChatGPT, employed data scrapping, data cleaning, EDA, topic modeling, and sentiment analysis.
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.
REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript
Resources of our survey paper "Enabling AI on Edges: Techniques, Applications and Challenges"
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
GWAS summary statistics files QC tool
Data preparation for data science projects.
A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.
Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset
Movie Recommendation System: Project using R and Machine learning
Data stream analytics: Implement online learning methods to address concept drift and model drift in dynamic data streams. Code for the paper entitled "A Multi-Stage Automated Online Network Data Stream Analytics Framework for IIoT Systems" published in IEEE Transactions on Industrial Informatics.