data-augmentation

There are 37 repositories under data-augmentation topic.

snorkel-team / snorkel
A system for quickly generating training data with weak supervision
ai data-augmentation data-science data-slicing labeling machine-learning python snorkel training-data weak-supervision
Language:Python 5921
DALI
NVIDIA / DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
audio-processing data-augmentation data-processing deep-learning fast-data-pipeline gpu gpu-tensorflow image-augmentation image-processing machine-learning mxnet neural-network paddle python pytorch
Language:C++ 5548
ZhaoJ9014 / face.evoLVe
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥
pytorch face-recognition face-detection face-alignment face-landmark-detection model-training feature-extraction fine-tuning data-augmentation deep-learning computer-vision imbalanced-learning transfer-learning hard-negative-mining supervised-learning nus tencent convolutional-neural-network machine-learning artificial-intelligence
Language:Python 3562
QData / TextAttack
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
adversarial-attacks adversarial-examples adversarial-machine-learning data-augmentation machine-learning natural-language-processing nlp security
Language:Python 3300
webdataset / webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
data-augmentation deep-learning pytorch webdataset webdataset-format
Language:Python 2873
torchio
TorchIO-project / torchio
Medical imaging processing for AI applications.
pytorch medical-image-computing deep-learning data-augmentation medical-images machine-learning python medical-image-processing medical-image-analysis medical-imaging-datasets medical-imaging-with-deep-learning augmentation
Language:Python 2312
iver56 / audiomentations
A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.
audio audio-data-augmentation audio-effects augmentation data-augmentation deep-learning dsp machine-learning music python sound sound-processing
Language:Python 2170
425776024 / nlpcda
一键中文数据增强包； NLP数据增强、bert数据增强、EDA：pip install nlpcda
nlp data-augmentation chinese-data-augmentation nlpcda chinese-eda
Language:Python 1871
visual-layer / fastdup
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.
data-curation dataset deep-learning image-duplicate-detection machine-learning novelty-detection object-detection outlier-detection python visual-search data-augmentation image-classification image image-classfication image-processing visualization-tools image-analysis visualization image-similarity
Language:Python 1788
jasonwei20 / eda_nlp
Data augmentation for NLP, presented at EMNLP 2019
nlp data-augmentation text-classification synonyms embeddings sentence classification rnn cnn swap position
Language:Python 1649
AgaMiko / data-augmentation-review
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
data-augmentation data-synthesis data-generation generative-adversarial-network review survey style-transfer machine-learning augmentation-policies data-augmentations autoaugment graph-data-augmentation image-augmentation audio-augmentation nlp-augmentation
1642
yongzhuo / nlp_xiaojiang
自然语言处理（nlp），小姜机器人（闲聊检索式chatbot），BERT句向量-相似度（Sentence Similarity），XLNET句向量-相似度（text xlnet embedding），文本分类（Text classification），实体提取（ner，bert+bilstm+crf），数据增强（text augment, data enhance），同义句同义词生成，句子主干提取（mainpart），中文汉语短文本相似度，文本特征工程，keras-http-service调用
bert chatbot chinese data-augmentation distance enhance feature nlp text-augment text-classification xlnet
Language:Python 1534
LirongWu / awesome-graph-self-supervised-learning
Code for TKDE paper "Self-supervised learning on graphs: Contrastive, generative, or predictive"
self-supervised-learning machine-learning unsupervised-learning graph-neural-networks pre-training data-augmentation pretext-task representation-learning transfer-learning deep-learning
1421
zhanlaoban / EDA_NLP_for_Chinese
An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。
chinese chinese-data-augmentation data-augmentation easy-data-augmentation eda text-classification
Language:Python 1383
Tebmer / Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
alignment compression data-augmentation data-synthesis feedback instruction-following kd knowledge-distillation large-language-model llm multi-modal self-distillation self-training supervised-finetuning survey
1203
Paperspace / DataAugmentationForObjectDetection
Data Augmentation For Object Detection
data-augmentation imagine-augmentation object-detection bounding-box deep-learning opencv
Language:Jupyter Notebook 1152
iver56 / torch-audiomentations
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
audio audio-data-augmentation audio-effects augmentation data-augmentation deep-learning differentiable-data-augmentation dsp machine-learning music python pytorch sound sound-processing waveform
Language:Python 1102
quqxui / Awesome-LLM4IE-Papers
Awesome papers about generative Information Extraction (IE) using Large Language Models (LLMs)
cross-domain-learning data-augmentation event-arguments event-detection event-extraction few-shot-learning in-context-learning information-extraction knowledge-graph-construction large-language-models named-entity-recognition relation-extraction zero-shot-learning
1022
goru001 / inltk
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
nlp deep-learning indic-languages pytorch data-augmentation sentence-similarity sentence-encoding word-embeddings sentence-embeddings
Language:Python 836
styfeng / DataAug4NLP
Collection of papers and resources for data augmentation for NLP.
acl2021 artificial-intelligence data-augmentation deep-learning machine-learning natural-language-processing survey survey-paper text-classification transformers
831
zhunzhong07 / Random-Erasing
Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST
data-augmentation image-classification object-detection person-re-identification pytorch aaai2020
Language:Python 734
Westlake-AI / openmixup
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
pytorch awesome-list awesome-mim awesome-mixup contrastive-learning data-augmentation image-classifcation imagenet masked-image-modeling mixup self-supervised-learning semi-supervised-learning vision-transformer deep-learning benchmark automix data-generation machine-learning
Language:Python 658
DemisEom / SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
data-augmentation python pytorch specaugment speech speech-recognition tensorflow
Language:Python 655
textflint
textflint / textflint
Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
model-robustness adversarial-samples text-transformations text-augmentation robustness-analysis data-augmentation transformation subpopulation attack
Language:Python 649
YuliangXiu / MobilePose
Light-weight Single Person Pose Estimator
pytorch deeppose deep-learning pose-estimation mobilenetv2 data-augmentation dataloader machine-learning resnet-18 mobile-device shufflenet shufflenetv2 shufflenet-v2 squeezenet real-time realtime heatmap dsntnn lightweight
Language:Jupyter Notebook 642
synthcity
vanderschaarlab / synthcity
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
pytorch tabular-data privacy machine-learning generative-model data-augmentation fairness-ml synthetic-data
Language:Python 615
conradry / copy-paste-aug
Copy-paste augmentation for segmentation and detection tasks
instance-segmentation object-detection deep-learning data-augmentation copy-paste
Language:Jupyter Notebook 568
firmai / deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
data-augmentation tabular-data feature-extraction feature-engineering time-series augmentation data-science machine-learning finance
Language:Jupyter Notebook 555
arcelien / pba
Efficient Learning of Augmentation Policy Schedules
machine-learning data-science artificial-intelligence deep-learning python automl data-augmentation augmentation automated-machine-learning tensorflow convolutional-neural-networks image-classification
Language:Jupyter Notebook 506
MTG / DeepConvSep
Deep Convolutional Neural Networks for Musical Source Separation
signal-processing deep-learning source-separation theano convolutional-neural-networks sample-querying data-augmentation data-generation score-synthesis audio-synthesis
Language:Python 483
amanchadha / coursera-gan-specialization
Programming assignments and quizzes from all courses within the GANs specialization offered by deeplearning.ai
coursera gan gans generative-adversarial-network generative-model deeplearning-ai stylegan dcgan wgan pix2pix pytorch conditional-gan bias bias-detection biggan u-net data-augmentation artificial-intelligence deep-learning neural-network
Language:Jupyter Notebook 480
augraphy
sparkfish / augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
augmentation-pipeline computer-vision crappification data-augmentation data-pipeline deep-neural-networks image-processing machine-learning synthetic-data synthetic-dataset-generation training-data
Language:Python 474
hongyi-zhang / mixup
Implementation of the mixup training method
mixup pytorch cifar data-augmentation gan
Language:Python 469
codebox / image_augmentor
Data augmentation tool for images
data-augmentation image-augmentor machine-learning
Language:Python 459
bethgelab / imagecorruptions
Python package to corrupt arbitrary images.
image-corruptions data-augmentation python pip3
Language:Python 453
denisyarats / drq
DrQ: Data regularized Q
rl reinforcement-learning deep-learning mujoco dm-control gym pixel sac soft-actor-crit pytorch python actor-critic control drq deep-reinforcement-learning data-augmentation off-policy model-free
Language:Jupyter Notebook 417