Holistic Evaluation of Text-To-Image Models

Significant effort has recently been made in developing text-to-image generation models, which take textual prompts as input and generate images. As these models are widely used in real-world applications, there is an urgent need to comprehensively understand their capabilities and risks. However, existing evaluations primarily focus on image-text alignment and image quality. To address this limitation, we introduce a new benchmark, Holistic Evaluation of Text-To-Image Models (HEIM).

We identify 12 different aspects that are important in real-world model deployment, including:

image-text alignment
image quality
aesthetics
originality
reasoning
knowledge
bias
toxicity
fairness
robustness
multilinguality
efficiency

By curating scenarios encompassing these aspects, we evaluate state-of-the-art text-to-image models using this benchmark. Unlike previous evaluations that focused on alignment and quality, HEIM significantly improves coverage by evaluating all models across all aspects. Our results reveal that no single model excels in all aspects, with different models demonstrating strengths in different aspects.

This repository contains the code used to produce the results on the website and paper. To get started, refer to the documentation.

About

Holistic Evaluation of Text-to-Image Models (HEIM), a fork of HELM to evaluate to text-to-image models (paper coming soon).

https://crfm.stanford.edu/heim

Apache License 2.0

Languages

Language:Python 93.0%Language:JavaScript 4.9%Language:Jupyter Notebook 0.8%Language:HTML 0.8%Language:Shell 0.3%Language:CSS 0.2%Language:Dockerfile 0.0%