jasonyzhang / RayDiffusion

Code for "Cameras as Rays"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How can I use my own data?

1van2ha0 opened this issue · comments

Thanks for your great work!

I noticed that you use CO3Dv2 dataset to train and get a result. But I am wondering how can I run the train code by my own dataset just like colmap?

Hope to hear from you soon.

You can process your own data using a script such as this one: https://github.com/jasonyzhang/RayDiffusion/blob/dev/preprocess_co3d.py

Here's an example of how to convert COLMAP cameras to Pytorch3d NDC convention:

intrinsics_file = osp.join(annotation_dir, "cameras.txt")
with open(intrinsics_file, "r") as f:
    lines = f.readlines()[3:]

intrinsics = {}
for d in lines:
    d = d.replace("\n", "").split(" ")
    camera_id = d[0]
    w, h, f, px, py, s = map(float, d[2:])
    intrinsics[camera_id] = w, h, f, px, py, s

# Load camera extrinsics
extrinsics_file = osp.join(annotation_dir, "images.txt")
with open(extrinsics_file, "r") as f:
    all_lines = f.readlines()
    lines = all_lines[4::2]

image_names = {}
cameras = {}

torch3d_T_colmap = torch.tensor([[-1, 0, 0], [0, -1, 0], [0, 0, 1]]).float()

for j in range(len(lines)):
    d = lines[j].replace("\n", "")
    d = d.split(" ")
    camera_id = d[8]
    image_id = d[0]
    image_name = d[-1]

    d = list(map(float, d[1:8]))
    R = pytorch3d.transforms.quaternion_to_matrix(torch.tensor(d[:4]))
    t = torch.tensor(d[4:])
    R = (torch3d_T_colmap @ R).T
    T = torch3d_T_colmap @ t

    w, h, f, px, py = intrinsics[camera_id]
    # Convert intrinsics to NDC.
    l = min(h, w)
    focal_length_ndc = [2 * f / l, 2 * f / l]
    principal_point_ndc = [-2 * px / l + w / l, -2 * py / l + h / l]
    cameras[image_id] = {
        "focal_length": focal_length_ndc,
        "principal_point": principal_point_ndc,
        "radial_distortion": [k],
        "image_size": [w, h],
        "R": R.tolist(),
        "T": T.tolist(),
    }
    image_names[image_id] = image_name

image_names = dict(sorted(image_names.items()))
cameras = dict(sorted(cameras.items()))

I'm sorry for the misunderstanding caused by my words.
Auctually, what I mean is, can I input my own multi-view images (instead of co2d images), and obtain the camera pose for each image by the train code, just like Colmap handles multi view images?

Oh sure! There is a colab notebook available in the README to make it easy to run.