vanderschaarlab / synthetic-data-lab

A repository containing the materials required to complete the "AAAI Lab for Innovative Uses of Synthetic Data". This includes tutorials on how to use the library "Synthcity" for improving the fairness and privacy of a dataset as well as for augmenting a small dataset using some other similar datasets.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AAAI-23 Lab for Innovative Uses of Synthetic Data

Welcome to the AAAI-23 Lab for Innovative Uses of Synthetic Data! In this repository you will find materials required to complete the Lab.

The Lab will start on Wednesday, 8 February 2023, 14:00 EST. It will be a four-hour hybrid event. Both physical and online participants need to register on the AAAI website to join the live session.

Before the lab

We will run the hands-on session using Google Colab. As a benefit, there is no need to pre-install any library or download any dataset.

For the physical participants, please bring with you a fully charged laptop for the four-hour session. The event host has notified us that there won't be enough power sockets for everyone at the venue. And these sockets are allocated on a first come, first served basis.

The Lab is based on the open-source Python library synthcity. To make the most out of the Lab, we recommend the participants to explore the library beforehand. Here is a list of useful materials:

During the lab

Schedule

Please note that all times are reported in EST (UTC-05:00).

Start End Session Title Description
2:00pm 2:30pm Opening and Intro We go through the promise of synthetic data in empowering AI development and the associated challenges.
2:30pm 3:15pm Data Modality We demonstrate how synthcity can generate tabular data with diverse modalities, including static data, regular and irregular time series, data with censoring, multi-source data, and composite data.
3:15pm 3:30pm Q&A
3:30pm 4:00pm Break
4:00pm 4:30pm Fairness We show how synthetic data can promote ML fairness by (1) augmenting minority classes with conditional generation and (2) removing bias via causal generation.
4:30pm 5:00pm Privacy We introduce privacy-preserving synthetic data generators that facilitates sharing of sensitive data. We will cover differential-privacy based methods as well as methods that defend against specific threat models.
5:00pm 5:30pm Transfer We show how to alleviate data scarcity by augmenting a small dataset using information learned from other related datasets in a transfer learning style.
5:30pm 5:45pm Q&A
5:45pm 6:00pm Further Engagement We discuss ways of further engaging with the application and development of synthcity.

The Interactive tutorials are available here.

After the lab

Download and use the library - join the development. Raise issues and open pull requests.

If you've enjoyed the lab, why not Star Synthcity on GitHub. The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we're building.

Sign up to our synthetic data mailing list to stay up to date on news about the SyntheticData4ML community. We will post about upcoming tutorials, workshops, competitions, hackathons and more!

Join our Machine Learning Engagement sessions, "Inspiration Exchange", for discussions of our research projects and software, such as Synthcity. Sign up here.

About

A repository containing the materials required to complete the "AAAI Lab for Innovative Uses of Synthetic Data". This includes tutorials on how to use the library "Synthcity" for improving the fairness and privacy of a dataset as well as for augmenting a small dataset using some other similar datasets.

License:MIT License


Languages

Language:Jupyter Notebook 94.1%Language:Python 5.9%