ggsonic / sitcoms3D

Code for "The One Where They Reconstructed 3D Humans and Environments in TV shows" appearing in ECCV 2022.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reconstructing 3D Humans and Environments in TV Shows

This repository is for 3D reconstructing humans and TV shows as described in the paper "The One Where They Reconstructed 3D Humans and Environments in TV shows" in ECCV 2022.

You can find our project page at https://ethanweber.me/sitcoms3D/.

Getting the data

You can either go to this GDrive folder and download the files or run the following scrip (enabled by the gdown pip package). All unzipped folders should live in the data/ folder after running the script.

pip install gdown
python download_data.py

Our data uses a convention of <sitcom>-<location> for seven sitcoms, which can be seen in the NeRF-W panoramic images below:

  1. TBBT-big_living_room
  2. Frasier-apartment
  3. ELR-apartment
  4. Friends-monica_apartment
  5. TAAHM-kitchen
  6. Seinfeld-jerry_living_room
  7. HIMYM-red_apartment

Panoramic NeRF-W renderings of the sitcom locations

Environments: sparse reconstruction and NeRF-W data

This section concerns the data of the COLMAP sparse reconstructions and images used to train NeRF-W.

# sparse_reconstruction_and_nerf_data.zip
|- sparse_reconstruction_and_nerf_data/<sitcom>-<location>/
  |- cameras.json
  |- colmap/
  |- images/
  |- panoptic_classes.json
  |- segmentations/
  |- threejs.json
  • cameras.json is a processed version of the colmap/ sparse reconstruction and the threejs.json file. The keys include {"bbox", "point_cloud_transform", "scale_factor", "frames"}. The "frames" are processed camera poses (NeRF cameras) where NeRF cameras = (point_cloud_transform @ COLMAP cameras) / scale_factor. See notebooks/data_demo.ipynb for an explanation.

  • images/ folder contains all the images used to train NeRF-W (~100-200 images per location).

  • panoptic_classes.json and segmentations/ have been created with panoptic segmentation from detectron2. panoptic_classes are ordered and correspond to the pixel values inside segmentations for the stuff and thing classes, respectively. We only use the thing person class in our work. However, we are including all information to encourage future work on incoorportating semantics into the scene + human reconstruction pipeline. For example, Semantic-NeRF could be used with this data.

  • threejs.json is a file that can be visualized with this online three.js editor https://threejs.org/editor/. This file will show the COLMAP sparse point cloud and the bounding box used to define regions where the NeRF-W field is valid. point_cloud_transform was created in this interface, where we rotated and translated the point cloud in the three.js editor to obtain an axis-aligned bounding box (AABB). This allowed for efficient ray near/far bounds sampling when using with NeRF.


Humans: SMPL parameters and human-pair data

Here we give an overview of the contents in each of the files relevant for human reconstruction.

human_data.zip
human_data/<sitcom>-<location>.json
# Contains the "openpose_keypoints" for all humans and the "smpl" parameters where they exist.
# The "smpl" parameters only exist when we could use our method ("calibrated multi-shot") to optimize across the shot change.
{
  "<image_name>": [
    { # human_idx_0 for this image_name
      "openpose_keypoints": ...,
      "smpl": {
        "camera_translation": ...,
        "betas": ...,
        "global_orient": ...,
        "body_pose": ...,
        "colmap_rescale": ...
      }
    },
    { # human_idx_1 for this image_name
      ...
    }
  ]
}

human_pairs.zip
human_pairs/<sitcom>-<location>.json
# The image idx, human_idx pairs for which humans were optimized together after solving the Hungarian matching problem.
# This is where our method ("calibrated multi-shot") was used to create the "smpl" parameters as described above.
[
  [image_name_a, human_idx_a, image_name_b, human_idx_b],
  ...
]

2D DISK features

Registering new images into the same coordinate frame as our COLMAP reconstructions requires having 2D DISK features to match to. The ZIP files are stored here with the filenames <sitcom>-<location>-disk.zip. These files are quite large, but you can unzip them and put their contents in the data/sparse_reconstruction_and_nerf_data/<sitcom>-<location>/ folder. Now, your folders should take the following form:

|- sparse_reconstruction_and_nerf_data/<sitcom>-<location>/
  |- cameras.json
  |- colmap/
  |- database.db            # added in this step
  |- h5/                    # added in this step
  |- images/
  |- masks/                 # added in this step
  |- panoptic_classes.json
  |- segmentations/
  |- threejs.json

Note that we only include Friends-monica_apartment-disk.zip because these files are on the order of ~30BG. Please contact us if you need DISK feature information for other sitcom locations.

Demo with our data

We provide a demo of using our data in notebooks/data_demo.ipynb. To run this demo, you'll need to install the required packages in requirements.txt.

pip install -r requirements.txt
# now open notebooks/data_demo.ipynb to play with the data

Register new images to COLMAP sparse reconstructions

See REGISTER_NEW_IMAGES.md for details on how to register new images to our sparse reconstructions (i.e., to obtain new camera parameters for images in our sitcom rooms).

Qualitative user study

We used the codebase https://github.com/ethanweber/anno for our qualitative user study. The code requires data, setup, and webpage hosting. However, it is quite generalizable and can be used for many qualitative user study tasks. The basic idea behind the repo is to create HITs (human intelligence tasks) with questions each composed of (1) a question, (2) a list of media (images, videos, etc.) and (3) possible choices. Given the question, the user will respond with their answer choice. We employ consistency quality by showing the same questions multiple times with different ordering of media/choices and only keep responses where annotators performed sufficiently well.

About

Code for "The One Where They Reconstructed 3D Humans and Environments in TV shows" appearing in ECCV 2022.


Languages

Language:Jupyter Notebook 95.2%Language:Python 4.8%