genevera / AI-Self-Training-DPO-SDXL

Unofficial version!!! Stable diffusion model trained by AI Feedback-Based Self-Training Direct Preference Optimization.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AI Feedback-Based Self-Training Direct Preference Optimization

HuggingFace Hits

Dataset Details

Num examples = 37180
Num Epochs = 3

Compared To Human Feedback Model

Our model tends to perform closer to the SDXL-Base, but with optimized image details. The model provided in the original paper exhibits better color and detail performance, more in line with human preferences. This also reflects a characteristic of using self-training to train the original model: it can optimize according to AI preferences while ensuring the capabilities of the original model. Training based on human preference data will make the output quality closely related to the human preference dataset.

Acknowledgement

This work is based on the Diffusion Model Alignment Using Direct Preference Optimization method.

About

Unofficial version!!! Stable diffusion model trained by AI Feedback-Based Self-Training Direct Preference Optimization.


Languages

Language:Python 98.7%Language:Shell 1.3%