There are 13 repositories under synthetic-dataset-generation topic.
A framework for prompt tuning using Intent-based Prompt Calibration
Perception toolkit for sim2real training and validation in Unity
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
A curated list of awesome projects which use Machine Learning to generate synthetic content.
NVIDIA Deep learning Dataset Synthesizer (NDDS)
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
SynthDet - An end-to-end object detection pipeline using synthetic data
Random dataframe and database table generator
Unity's privacy-preserving human-centric synthetic data generator
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
NVIDIA Dataset Utilities (NVDU)
This is the dataset and code release of the OpenRooms Dataset. For more information, please refer to our webpage below. Thanks a lot for your interest in our research!
BEDLAM (CVPR 2023) render pipeline tools
Synthetic Occlusion Augmentation
Repository to identify Lego bricks automatically only using images
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
(SIGCOMM '22) Practical GAN-based Synthetic IP Header Trace Generation using NetShare
Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation (NeurIPS2023)
Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
nbsynthetic is simple and robust tabular synthetic data generation library for small and medium size datasets
Reference github repository for the paper "Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data". We propose a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Leveraging these realistic synthetic DP images, we introduce a new recurrent convolutional network (RCN) architecture that can improve defocus deblurring results and is suitable for use with single-frame and multi-frame data captured by DP sensors.
Reference github repository for the paper "Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning". We propose a single-image deblurring network that incorporates the two sub-aperture views into a multitask framework. Specifically, we show that jointly learning to predict the two DP views from a single blurry input image improves the network’s ability to learn to deblur the image. Our experiments show this multi-task strategy achieves +1dB PSNR improvement over state-of-the-art defocus deblurring methods. In addition, our multi-task framework allows accurate DP-view synthesis (e.g., ~ 39dB PSNR) from the single input image. These high-quality DP views can be used for other DP-based applications, such as reflection removal. As part of this effort, we have captured a new dataset of 7,059 high-quality images to support our training for the DP-view synthesis task.
replicAnt - generating annotated images of animals in complex environments with Unreal Engine
Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs
Improving performance of motor imagery classification using variational-autoencoder and synthetic EEG signals