Prompt. Generate Synthetic Data. Train & Align Models.
DataDreamer is a powerful open-source Python library for prompting, synthetic data generation, and training workflows. It is designed to be simple, extremely efficient, and research-grade.
Installation pip3 install datadreamer.dev |
|
demo.py |
Result of demo.py |
---|---|
See the full demo script |
See the synthetic dataset and the trained model |
π For more demonstrations and recipes see the Quick Tour page. |
With DataDreamer you can:
- π¬ Create Prompting Workflows: Create and run multi-step, complex, prompting workflows easily with major open source or API-based LLMs.
- π Generate Synthetic Datasets: Generate synthetic datasets for novel tasks or augment existing datasets with LLMs.
- βοΈ Train Models: Align models. Fine-tune models. Instruction-tune models. Distill models. Train on existing data or synthetic data.
- ... learn more about what's possible in the Overview Guide
DataDreamer is:
- 𧩠Simple: Simple and approachable to use with sensible defaults, yet powerful with support for bleeding edge techniques.
- π¬ Research-Grade: Built for researchers, by researchers, but accessible to all. A focus on correctness, best practices, and reproducibility.
- ποΈ Efficient: Aggressive caching and resumability built-in. Support for techniques like quantization, parameter-efficient training (LoRA), and more.
- π Reproducible: Workflows built with DataDreamer are easily shareable, reproducible, and extendable.
- π€ Makes Sharing Easy: Publishing datasets and models is simple. Automatically generate data cards and model cards with metadata. Generate a list of any citations required.
- ... learn more about the motivation and design principles behind DataDreamer.
coming soon...
Please reach out to us via email (ajayp@upenn.edu) or on Discord if you have any questions, comments, or feedback.
Copyright Β© 2024, Ajay Patel. Released under the MIT License.
Thank you to the maintainers at Hugging Face and LiteLLM for accepting contributions neccessary for DataDreamer and providing upstream support.