There are 11 repositories under data-preprocessing topic.
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Machine learning with dataframes
Open source project for data preparation for GenAI applications
Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning
Machine Learning library for the web and Node.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Jupyter Notebooks and Data Sets for Pandas Library
A day to day plan for this challenge. Covers both theoritical and practical aspects
A simpler way of reading and augmenting image segmentation data into TensorFlow
The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
Social Media Mining Toolkit (SMMT) main repository
SEGAN pytorch implementation https://arxiv.org/abs/1703.09452
Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
I will update this repository to learn Machine learning with python with statistics content and materials
A quantitative study on over 1.25 million tweets about ChatGPT, employed data scrapping, data cleaning, EDA, topic modeling, and sentiment analysis.
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
sciblox - Easier Data Science and Machine Learning
Accelerating AI Training and Inference from Storage Perspective (Must-read Papers on Storage for AI)
XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis
“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.
This project focuses on data preprocessing and epilepsy seizure prediction using the CHB-MIT EEG dataset. It includes steps like data cleansing, feature extraction, and handling imbalanced datasets, aimed at improving the accuracy of seizure prediction.
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.
ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
Movie Recommendation System: Project using R and Machine learning
GWAS summary statistics files QC tool
REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.