There are 1 repository under data-synthesis topic.
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
[CVPR 2023] Label-Free Liver Tumor Segmentation
official code for Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking
Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs
Official implementaion of EMNLP 2022 paper "Generate, Discriminate, and Contrast: A Semi-Supervised Sentence Representation Learning Framework"
A data synthesizer for creating datasets of feet from a first-person perspective.
Apache NiFi Data Synthesizer
Code for "Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis"
Coursera - RNN Programming Assignment: In this project, we will construct a speech dataset and implement an algorithm for trigger word detection (sometimes also called keyword detection, or wake word detection).
Blender Python Package for extracting internal data from blender scenes for 3d related data generation purposes.
The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world
data synthesis for simulation of pen-based interaction
An environmental data synthesis pipeline for the Biodiversity Exploratories and other research consortia
Synthesis data in YOLO format given background and object images
This GitHub repository showcases my bachelor thesis which is focused on exploring the application and comparison of various deep generative models for synthetic image augmentation in manufacturing domain.
Build machine learning image classifiers and summarize large image datasets from the Imaging FlowCytobot (IFCB)
Comprehensive reproduction of the paper "BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting" by Noa Dagan, MD, et al., assisted by Professor Yair Goldberg. This statistical project explores vaccination's multifaceted impact on infection rates, employing synthetic data, advanced matching, and sophisticated statistical analysis.
A Label-Free and Data-Free Synthesis Engine and Training Framework for Vascular Segmentation of sOCT Data with PyTorch.
SynthShapes is a Python package for generating synthetic shapes in 3D, tailored for augmenting biomedical imaging training datasets.
For this project, I aimed to perform sentiment analysis on IMDB movie reviews. My dataset consisted of over 36,000 reviews, each accompanied by movie ratings ranging from 0 to 10. The primary objective was to construct a machine learning model capable of categorizing reviews into three sentiment classes: negative, neutral, and positive.
Releases for 「Synthesizing Realistic Data for Table Recognition」
A repository for synthesizing and simulating MRI images
Website of the ready4 suite of tools for data synthesis and modelling in mental health
Repository for Slide Deck and Code Examples for talk at SDP Convening 2023