Can I use it to compare different text-image pairs？

Question

Can I use it to compare different text-image pairs？

Duanener opened this issue a year ago · comments

Hi, thanks for sharing such great work. However, there is a question.
If I only have a bunch of image and text pairs, that is, [[prompt1, image1], [prompt2, image2], ...[promptN, imageN]], there is a one-to-one correspondence between them, not a one-to-many relationship, may I ask? , in this case how should I use your model to rank the aesthetic and human preference scores of these image-text pairs?

Blakey Wu · Answer 1 · Sun Sep 10 2023 20:08:26 GMT+0800 (China Standard Time)

Theoretically, this is not recommended, because our training data takes the form of single prompt vs. multiple images. The training target can not guarantee a good comparison between images of different prompts.
But if you try ranking images of different prompts, you can get some meaningful result. Here is an example: https://tgxs002.github.io/hps_filter.github.io/ . If you are just interested in filtering a dataset, you can give it a try, and see if it works for you.

Duanener · Answer 2 · Sun Sep 10 2023 20:12:32 GMT+0800 (China Standard Time)

Thanks for your reply.