training-data

There are 1 repository under training-data topic.

snorkel-team / snorkel
A system for quickly generating training data with weak supervision
ai data-augmentation data-science data-slicing labeling machine-learning python snorkel training-data weak-supervision
Language:Python 5762
diffgram
diffgram / diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
annotation annotation-tool training-data video-annotation data-annotation kubernetes data-science data-analytics image-annotation machine-learning deep-learning data annotations datasets labeling datastore
Language:Python 1824
ydata-synthetic
ydataai / ydata-synthetic
Synthetic data generators for tabular and time-series data
datageneration datagenerator deep-learning gan gan-architectures gans generative-adversarial-network machine-learning python3 pytorch synthetic-data tensorflow2 time-series timeseries training-data
Language:Jupyter Notebook 1379
NorskRegnesentral / skweak
skweak: A software toolkit for weak supervision applied to NLP tasks
data-science distant-supervision natural-language-processing nlp-library nlp-machine-learning python spacy training-data weak-supervision
Language:Python 914
myvision
OvidijusParsiunas / myvision
Computer vision based ML training data generation tool :rocket:
ml machine-learning computer-vision object-detection training-data annotation labelling annotation-tool coco vgg tensorflow yolo model vision image-annotation label labeling-tool tagging image ai
Language:JavaScript 573
compose
alteryx / compose
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
machine-learning automl prediction-engineering prediction-problem data-science labeling-tool labeling ai training-data data-labeling
Language:Python 486
a-maliarov / amazoncaptcha
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
captcha captcha-solver amazon python3 pillow amazon-captcha amazon-scraper training-data amazoncaptcha data-extraction
Language:Python 437
Slava / label-tool
Web application for image labeling and segmentation
image-label image-labeling image-labeling-tool computer-vision machine-learning training-data segmentation labelme computer-vision-tools image-annotation boundingbox data-labeling sematic-segmentation
Language:JavaScript 344
augraphy
sparkfish / augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
augmentation-pipeline computer-vision crappification data-augmentation data-pipeline deep-neural-networks image-processing machine-learning synthetic-data synthetic-dataset-generation training-data
Language:Python 320
d5555 / TagEditor
🏖TagEditor - Annotation tool for spaCy
annotation-tool spacy named-entities neuralcoref coreference-resolution text-annotation labeling-tool nlp annotation machine-learning data-science tagging-tool natural-language-processing neural-networks text-tagging spacy-visualizer training-data named-entity-recognition
180
Geocene / trainset
A lightweight web application for brushing labels onto time series data; useful for building training sets.
labeling-tool brushing machine-learning training-data labeling painting time-series-classification
Language:JavaScript 157
KennethEnevoldsen / augmenty
Augmenty is an augmentation library based on spaCy for augmenting texts.
augmentation spacy-extension spacy nlp nlproc natural-language-processing python text-classification training-data text-augmentation spacy-nlp
Language:Python 148
tzano / fountain
Natural Language Data Augmentation Tool for Conversational Systems
nlu data-generator chatbot training-data natural-language conversational-ai
Language:Python 117
enginBozkurt / carla-training-data
Generating training data from the Carla driving simulator in the KITTI dataset format
artificial-intelligence autonomous-driving autonomous-vehicles carla-simulator deep-learning kitti-dataset self-driving-car training-data
Language:Python 105
avinashsen707 / AUBOi5-D435-ROS-DOPE
Aubo i5 Dual Arm Collaborative Robot - RealSense D435 - 3D Object Pose Estimation - ROS
aubo-robot blender camera-node customdataset dataset deep-learning dope ndds object-detection pose-estimation ros ros-wrapper thesis training-data ubuntu weights
Language:C++ 95
rahul051296 / small-talk-rasa-stack
Collection of casual conversations that can be used with the Rasa Stack
rasa-nlu rasa-core smalltalk training-data conversational-ai dialogflow
Language:Python 85
google-research-datasets / swim-ir
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
cross-lingual datasets deep-learning information-retrieval machine-learning multilingual natural-language-processing neural-information-retrieval nlp training-data
42
hernanmd / COVID-19-train-audio
COVID-19 Coughs files for training AI models
covid-19 coronavirus covid19 wavelet-analysis cough-monitor audio-analysis training-data
Language:Python 42
megagonlabs / ruler
Data Programming by Demonstration (DPBD) for Document Classification
data-programming data-science training-data weak-supervision machine-learning data-labeling
Language:Jupyter Notebook 36
InstaPy / instapy-gender-classification
🔎 Classification helper for sex classification feature of InstaPy
instapy classification helper training-data
Language:Python 34
milangritta / Pragmatic-Guide-to-Geoparsing-Evaluation
Full resources supporting the publication "A Pragmatic Guide to Geoparsing Evaluation."
linguistics data geoparsing geoparser geocoding geocoder evaluation taxonomy location places geography analysis machine-learning training-data toponym-resolution toponyms toponymy named-entity-recognition google-cloud spacy-nlp
Language:Python 34
benbo / interactive-weak-supervision
Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling
weak-supervision data-programming data-labeling active-weak-supervision interactive-weak-supervision machine-learning training-data
Language:Python 30
rhammell / planesnet
Labeled training data for detection of aircraft in Planet satellite imagery
training-data satellite-imagery planet-imagery remote-sensing machine-learning
30
alexkalinins / hairnet-ai
Machine Learning project aimed at converting images into .obj 3D models by representing them as Blender hair-type particle systems.
machine-learning blender pytorch 3d-models training-data
Language:Python 24
ableinc / git2txt
Convert all files in git repository to .txt files. Useful for training LLMs on your codebase.
git llm machine-learning python3 training-data txt
Language:Python 22
ajsanjoaquin / Shapley_Valuation
PyTorch reimplementation of computing Shapley values via Truncated Monte Carlo sampling from "What is your data worth? Equitable Valuation of Data" by Amirata Ghorbani and James Zou [ICML 2019]
interpretable-deep-learning pytorch pytorch-implementation interpretable-machine-learning explainable-ai data-valuation training-data training-data-curation fairness-ml fairness-ai fairness deeplearning machine-learning shapley-value game-theory
Language:Python 21
trainingdata / AIAssistedImageVideoLabelling
AI Assisted Image and Video Training Data Labeling @ Scale
machine-learning machine-vision training-data labeling labeling-tool labelingtool image-classification image-segmentation computer-vision tensorflow deep-learning image-annotation computer-vision-annotation boundingbox bounding-boxes imagenet annotation-tool annotation-tool-offline supervised-learning
Language:HTML 21
abinashmeher999 / voice-data-extract
A command line interface to combine text information from subtitles with voice data in the video. Provides a convenient way to generate training data for speech-recognition purposes.
training-data speech-to-text speech-recognition
Language:Python 19
dterg / biomedical_corpora
Table compiling the list of biomedically-related corpora available for named entity recognition (and some also suitable for association detection). First version has was published as part of the paper: Dieter Galea, Ivan Laponogov, Kirill Veselkov; Exploiting and assessing multi-source data for supervised biomedical named entity recognition, Bioinformatics, bty152, https://doi.org/10.1093/bioinformatics/bty152 . If you would like to add other (or your) corpora, please submit a pull request and I'll happily approve it.
biomedical corpora corpus corpus-linguistics training-data
18
MinhasKamal / AlphabetRecognizer
Simple Optical Character Recognizer (english-ocr-image-to-text-recognition-sample-trainig-alphabet-photo-data-database-dataset)
alphabet-recognizer data database english image-processing java machine-learning ocr sample template-matching text-recognition training-data writing
Language:Java 17
wakakalu / TransE
A simple implement of TransE, the ML algorithm published in 2013
transe machine-learning training-data
Language:Python 12
bot-astro / gpt-3-training-data
A set of questions & answers used to train a chatGPT model.
chatgpt training-data
11
stritti / thermal-solar-plant-dataset
Realtime Thermal Solar Plant Dataset for Machine Learning
dataset machine-learning examples iot training-data opendata smarthome research public-data
11
deepraj1729 / Track
Training images for training self-driving cars on Udacity Nanodegree Self-driving Car Simulator
self-driving-car reinforcement-learning deep-learning training-data image-processing udacity-self-driving-car udacity udacity-nanodegree
10
minimal-RTE__ner-training-data
hou2zi0 / minimal-RTE__ner-training-data
Minimal customization of Quill.js Rich Text Editor for easy annotation of text snippets for NER model training with spaCy.
rte quill spacy annotated-texts annotation-tool ner training-data nlp
Language:JavaScript 10
MaaAssistantArknights / ArknightsTrainingData
明日方舟相关机器学习训练数据 | Machine learning training data for Arknights
machine-learning machine-vision ocr training-data
Language:Python 9

training-data

snorkel-team / snorkel

diffgram / diffgram

ydataai / ydata-synthetic

NorskRegnesentral / skweak

OvidijusParsiunas / myvision

alteryx / compose

a-maliarov / amazoncaptcha

Slava / label-tool

sparkfish / augraphy

d5555 / TagEditor

Geocene / trainset

KennethEnevoldsen / augmenty

tzano / fountain

enginBozkurt / carla-training-data

avinashsen707 / AUBOi5-D435-ROS-DOPE

rahul051296 / small-talk-rasa-stack

google-research-datasets / swim-ir

hernanmd / COVID-19-train-audio

megagonlabs / ruler

InstaPy / instapy-gender-classification

milangritta / Pragmatic-Guide-to-Geoparsing-Evaluation

benbo / interactive-weak-supervision

rhammell / planesnet

alexkalinins / hairnet-ai

ableinc / git2txt

ajsanjoaquin / Shapley_Valuation

trainingdata / AIAssistedImageVideoLabelling

abinashmeher999 / voice-data-extract

dterg / biomedical_corpora

MinhasKamal / AlphabetRecognizer

wakakalu / TransE

bot-astro / gpt-3-training-data

stritti / thermal-solar-plant-dataset

deepraj1729 / Track

hou2zi0 / minimal-RTE__ner-training-data

MaaAssistantArknights / ArknightsTrainingData