apple / ml-hypersim

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

camera intrinsic matrix

feiran-l opened this issue · comments

Hi! Thank you for this awesome dataset. I am trying to re-project the RGBD images to point clouds (in camera coordinate) and I have already converted the ray-optical-center distance map to depth as suggested here . But I cannot well-understand how to find the camera intrinsic described here.

I have found that the metadata_camera_parameters.csv contains a key called camera_physical_focal_length. But it seems to be in millimeter unit and we need pixel size to convert to it pixel unit. Also, may I know how should I determine the principle points?

Thank you in advance!

This issue should due to my limited knowledge of openGL. I managed to extract the intrinsics from the openGL following this tutorial.

Hi! Great question.

  • Are you sure you want to re-project the depth_meters pixels into camera coordinates? You could just as easily start with the position images, where each pixel is a point in world coordinates. You could simply project all of those world-space points into camera-space using the known position and orientation of the camera (see the M_cam_from_world matrix in this notebook). This approach seems a bit easier than what you're proposing, because you only need to perform a single transformation.

  • If your application demands that you start from the depth_meters images, then recall that these images actually contain Euclidean distances in meters from the camera-space origin (as noted in the thread you linked to). We can construct a point cloud in camera-space by constructing a ray for each pixel that starts at the camera-space origin and has a length equal to its corresponding value in depth_meters. The direction of the ray at each pixel can be obtained using the M_cam_from_uv matrix in this notebook. If you're attempting to merge multiple images into a single point cloud, then you will need to convert the depth_meters values from meters into asset units, because all of our camera positions are specified in asset units.

  • metadata_camera_parameters.csv contains every conceivable camera parameter that we could export from the native scene assets. We exported as much as possible for the sake of completeness, but most users will only ever need the M_cam_from_uv matrix (to determine the camera-space ray corresponding to each pixel) or M_proj (to project points from camera-space into homogeneous clip space). In your case, I think you only need M_cam_from_uv, and that is only if you choose to start from the depth_meters images rather than the position images.

Hi! Thank you for the instructions. I think I can handle the intrinsics now.

Just another question, can the pixel size of the image also be documented? I think it would be helpful to provide them since the current depth_meters are in meter unit, and the intrinsics are in pixel unit. And some applications do require to unify the units.

I don't know what you mean exactly. I agree that the depth_meters images are specified in meters. But what do you mean when you say the intrinsics are in pixel units? I'm a graphics person, so I think of the intrinsics as a 4x4 matrix that maps points from camera space to homogeneous clip space. What would it mean for the pixels of a pinhole camera to have metric size? Do you mean you'd like to know the focal length in meters and the image size in meters?

The FOV is known for each every image, and the image dimensions in pixels are also known. So you can set the focal length in meters to be whatever you want, and then just solve for the image size in meters that preserves the FOV.

I mean the physical width and height of each pixel. For example, for a full-frame camera, the sensor board size is 36mm*24mm. So if we use it to take an image of resolution (1920, 1080), then the pixel width and height would be 36/1920=0.01875mm and 24mm/1080=0.0222mm.

And yes, the pixel size can also be calculated if we have the focal length in meters (say a 35mm lens for example), and the focal length in pixel unit (the one calculated from FOV, 886.8 in most scenes of the hypersim dataset). Then the pixel size should be 35mm/886.8.

I have found that in the metadata_camera_parameters.csv, there is a column called camera_physical_focal_length. The values therein also looks like defined in mm unit (i.e., 24mm lens, 35mm lens, etc). But I cannot confirm it.

Ah, got it. According to this documentation, the camera_physical_focal_length parameter does appear to be specified in mm. But again, you can choose whatever focal length and image size you want, as long as it matches the FOV.

I see. Thank you!