There are 13 repositories under data-generation topic.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
A powerful, feature-rich, random test data generator.
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
The Declarative Data Generator
Data generation and property-based testing for Elixir. 🔮
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
Generate strings that match a given regular expression
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
Deep Convolutional Neural Networks for Musical Source Separation
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
C++ Faker library for generating fake (but realistic) data.
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
A novel approach for synthesizing tabular data using pretrained large language models
Random dataframe and database table generator
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
[ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI
Whole-Body Nonlinear MPC for Realtime Humanoid Loco-Manipulation Planning and Control
BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.
The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation
📖 A curated list of resources dedicated to synthetic data
Mockingbird is a mock streaming data generator
(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator
🏞️ Official implementation of "Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition"
Custom image data generator for TF Keras that supports the modern augmentation module albumentations
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering