There are 11 repositories under data-preprocessing topic.
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Machine learning with dataframes
Implementation/Tutorial of using Automated Machine Learning (AutoML) methods for static/batch and online/continual learning
Open source project for data preparation of LLM application builders
Machine Learning library for the web and Node.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
A day to day plan for this challenge. Covers both theoritical and practical aspects
Jupyter Notebooks and Data Sets for Pandas Library
A simpler way of reading and augmenting image segmentation data into TensorFlow
Social Media Mining Toolkit (SMMT) main repository
The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
SEGAN pytorch implementation https://arxiv.org/abs/1703.09452
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"
I will update this repository to learn Machine learning with python with statistics content and materials
A quantitative study on over 1.25 million tweets about ChatGPT, employed data scrapping, data cleaning, EDA, topic modeling, and sentiment analysis.
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
sciblox - Easier Data Science and Machine Learning
“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.
XGBoost, LightGBM, LSTM, Linear Regression, Exploratory Data Analysis
Movie Recommendation System: Project using R and Machine learning
REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript
GWAS summary statistics files QC tool
A command-line utility program for automating the trivial, frequently occurring data preparation tasks: missing value interpolation, outlier removal, and encoding categorical variables.
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
This project focuses on data preprocessing and epilepsy seizure prediction using the CHB-MIT EEG dataset. It includes steps like data cleansing, feature extraction, and handling imbalanced datasets, aimed at improving the accuracy of seizure prediction.
Data stream analytics: Implement online learning methods to address concept drift and model drift in dynamic data streams. Code for the paper entitled "A Multi-Stage Automated Online Network Data Stream Analytics Framework for IIoT Systems" published in IEEE Transactions on Industrial Informatics.
Joblib-like interface for parallel GPU computations (e.g. data preprocessing)