There are 30 repositories under synthetic-data topic.
Code for Machine Learning for Algorithmic Trading, 2nd edition.
Mimesis is a powerful Python library that empowers developers to generate massive amounts of synthetic data efficiently.
A procedural Blender pipeline for photorealistic training image generation
Synthetic Patient Population Simulator
The Declarative Data Generator
Synthetic data generators for tabular and time-series data
⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
Curated list of open source tooling for data-centric AI on unstructured data.
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
A multi-purpose LLM framework for RAG and data creation.
A curated list of awesome projects which use Machine Learning to generate synthetic content.
Synthetic data generators for structured and unstructured text, featuring differentially private learning.
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
SynthDet - An end-to-end object detection pipeline using synthetic data
Official project website for the CVPR 2020 paper (Oral Presentation) "Cascaded deep monocular 3D human pose estimation wth evolutionary training data"
[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"
This repository provides you with an easy-to-use labeling tool for State-of-the-art Deep Learning training purposes. It supports Auto-Labeling.
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
Unity's privacy-preserving human-centric synthetic data generator
Random dataframe and database table generator
Code used to generate synthetic scenes and bounding box annotations for object detection. This was used to generate data used in the Cut, Paste and Learn paper