Reconstructing 3D Humans and Environments in TV Shows
This repository is for 3D reconstructing humans and TV shows as described in the paper "The One Where They Reconstructed 3D Humans and Environments in TV shows" in ECCV 2022.
https://ethanweber.me/sitcoms3D/.
You can find our project page atGetting the data
You can either go to this GDrive folder and download the files or run the following scrip (enabled by the gdown pip package). All unzipped folders should live in the data/ folder after running the script.
pip install gdown
python download_data.py
Our data uses a convention of <sitcom>-<location>
for seven sitcoms, which can be seen in the NeRF-W panoramic images below:
- TBBT-big_living_room
- Frasier-apartment
- ELR-apartment
- Friends-monica_apartment
- TAAHM-kitchen
- Seinfeld-jerry_living_room
- HIMYM-red_apartment
Environments: sparse reconstruction and NeRF-W data
This section concerns the data of the COLMAP sparse reconstructions and images used to train NeRF-W.
# sparse_reconstruction_and_nerf_data.zip
|- sparse_reconstruction_and_nerf_data/<sitcom>-<location>/
|- cameras.json
|- colmap/
|- images/
|- panoptic_classes.json
|- segmentations/
|- threejs.json
-
cameras.json
is a processed version of thecolmap/
sparse reconstruction and thethreejs.json
file. The keys include{"bbox", "point_cloud_transform", "scale_factor", "frames"}
. The"frames"
are processed camera poses (NeRF cameras) whereNeRF cameras = (point_cloud_transform @ COLMAP cameras) / scale_factor
. See notebooks/data_demo.ipynb for an explanation. -
images/
folder contains all the images used to train NeRF-W (~100-200 images per location). -
panoptic_classes.json
andsegmentations/
have been created with panoptic segmentation from detectron2.panoptic_classes
are ordered and correspond to the pixel values insidesegmentations
for thestuff
andthing
classes, respectively. We only use thething
person
class in our work. However, we are including all information to encourage future work on incoorportating semantics into the scene + human reconstruction pipeline. For example, Semantic-NeRF could be used with this data. -
threejs.json
is a file that can be visualized with this online three.js editor https://threejs.org/editor/. This file will show the COLMAP sparse point cloud and the bounding box used to define regions where the NeRF-W field is valid.point_cloud_transform
was created in this interface, where we rotated and translated the point cloud in the three.js editor to obtain an axis-aligned bounding box (AABB). This allowed for efficient ray near/far bounds sampling when using with NeRF.
Humans: SMPL parameters and human-pair data
Here we give an overview of the contents in each of the files relevant for human reconstruction.
human_data.zip
human_data/<sitcom>-<location>.json
# Contains the "openpose_keypoints" for all humans and the "smpl" parameters where they exist.
# The "smpl" parameters only exist when we could use our method ("calibrated multi-shot") to optimize across the shot change.
{
"<image_name>": [
{ # human_idx_0 for this image_name
"openpose_keypoints": ...,
"smpl": {
"camera_translation": ...,
"betas": ...,
"global_orient": ...,
"body_pose": ...,
"colmap_rescale": ...
}
},
{ # human_idx_1 for this image_name
...
}
]
}
human_pairs.zip
human_pairs/<sitcom>-<location>.json
# The image idx, human_idx pairs for which humans were optimized together after solving the Hungarian matching problem.
# This is where our method ("calibrated multi-shot") was used to create the "smpl" parameters as described above.
[
[image_name_a, human_idx_a, image_name_b, human_idx_b],
...
]
2D DISK features
Registering new images into the same coordinate frame as our COLMAP reconstructions requires having 2D DISK features to match to. The ZIP files are stored here with the filenames <sitcom>-<location>-disk.zip
. These files are quite large, but you can unzip them and put their contents in the data/sparse_reconstruction_and_nerf_data/<sitcom>-<location>/
folder. Now, your folders should take the following form:
|- sparse_reconstruction_and_nerf_data/<sitcom>-<location>/
|- cameras.json
|- colmap/
|- database.db # added in this step
|- h5/ # added in this step
|- images/
|- masks/ # added in this step
|- panoptic_classes.json
|- segmentations/
|- threejs.json
Note that we only include Friends-monica_apartment-disk.zip
because these files are on the order of ~30BG. Please contact us if you need DISK feature information for other sitcom locations.
Demo with our data
We provide a demo of using our data in notebooks/data_demo.ipynb. To run this demo, you'll need to install the required packages in requirements.txt.
pip install -r requirements.txt
# now open notebooks/data_demo.ipynb to play with the data
Register new images to COLMAP sparse reconstructions
See REGISTER_NEW_IMAGES.md for details on how to register new images to our sparse reconstructions (i.e., to obtain new camera parameters for images in our sitcom rooms).
Qualitative user study
We used the codebase https://github.com/ethanweber/anno for our qualitative user study. The code requires data, setup, and webpage hosting. However, it is quite generalizable and can be used for many qualitative user study tasks. The basic idea behind the repo is to create HITs (human intelligence tasks) with questions each composed of (1) a question, (2) a list of media (images, videos, etc.) and (3) possible choices. Given the question, the user will respond with their answer choice. We employ consistency quality by showing the same questions multiple times with different ordering of media/choices and only keep responses where annotators performed sufficiently well.