lukasHoel / text2room

Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023).

Home Page:https://lukashoel.github.io/text-to-room/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluation Code

fangchuan opened this issue · comments

commented

Hi, thanks for your great work, I am inspired a lot from it.
You compare your work with other text2room generation method on the 2D metric and user study, could you release the code of evaluation on 2D metric? the Clip score and Inception score of rendered image.

@lukasHoel hi, I have run your code by feeding customized text prompt, and evaluate the renderings of the generated room mesh by CLIP score, but I only get the score at the range among 24~25, how do you get the figure in the Table.1 in your paper ?
Specifically, I am using 'openai/clip-vit-base-patch16' do calculate the clip score.

Hi, sorry for the late response. We also used openai/clip-vit-base-patch16 and calculated the CLIP score to the same text prompt used for generating the scene. We report averaged scores for a bunch of images. Specifically, we use only images that show the scene from novel viewpoints, by calculating the clip score on all of these images:
<output_root>/output_rendering/rendering_noise_t.png

I also attach a small script that we used to calculate the CLIP score on a folder of images:

import argparse
import os
import json
from tqdm.auto import tqdm
import torch
import numpy as np
from PIL import Image
from torchmetrics.multimodal import CLIPScore


def pil_to_torch(img, device, normalize=True):
    img = torch.tensor(np.array(img), device=device).permute(2, 0, 1)
    if normalize:
        img = img / 255.0
    return img


def main(args):
    clip_score = CLIPScore(model_name_or_path="openai/clip-vit-base-patch16").cuda()
    images = [Image.open(os.path.join(args.image_folder, f)) for f in os.listdir(args.image_folder) if "png" in f or "jpg" in f]

    n_images = len(images)
    scores = torch.zeros(n_images, device=clip_score.device)

    pbar = tqdm(images, desc="Calc CLIP Score")
    for i, img in enumerate(pbar):
        img_torch = pil_to_torch(img, clip_score.device, normalize=False)
        score = clip_score(img_torch, args.prompt)
        scores[i] = score.detach()
        clip_score.reset()
        pbar.set_postfix_str(f"{scores.mean().cpu().numpy().item():.2f}")

    out_dict = {
        "scores": [s.cpu().numpy().item() for s in scores],
        "mean": scores.mean().cpu().numpy().item(),
        "std": scores.std().cpu().numpy().item(),
    }

    with open(os.path.join(args.out_path, "clip_score.json"), "w") as f:
        json.dump(out_dict, f)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)

    # GENERAL CONFIG
    parser.add_argument('--image_folder', required=True)
    parser.add_argument('--prompt', required=True)
    parser.add_argument('--out_path', required=False, default="")

    args = parser.parse_args()

    main(args)

The same applies also for the Inception Score. I attach a similar script here:

import argparse
import os
import json
import torch
import numpy as np
from PIL import Image
from torchmetrics.image.inception import InceptionScore


def pil_to_torch(img, device, normalize=True):
    img = torch.tensor(np.array(img), device=device).permute(2, 0, 1)
    if normalize:
        img = img / 255.0
    return img


def main(args):
    inception_score = InceptionScore().cuda()
    images = [Image.open(os.path.join(args.image_folder, f)) for f in os.listdir(args.image_folder) if "png" in f or "jpg" in f]
    images = torch.stack([pil_to_torch(i, inception_score.device, normalize=False) for i in images], dim=0)

    inception_score.update(images)
    out = inception_score.compute()

    out_dict = {
        "mean": out[0].cpu().numpy().item(),
        "std": out[1].cpu().numpy().item(),
    }

    with open(os.path.join(args.out_path, "inception_score.json"), "w") as f:
        json.dump(out_dict, f)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)

    # GENERAL CONFIG
    parser.add_argument('--image_folder', required=True)
    parser.add_argument('--out_path', required=False, default="")

    args = parser.parse_args()

    main(args)