sniklaus / 3d-ken-burns

an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Coordinate system of normal maps

waps101 opened this issue · comments

In your synthetic dataset, could you confirm that the normal maps are in camera coordinates as opposed to world? If I compute a normal map from the depth maps using finite difference and the intrinsic camera parameters, I can't get something that looks close to the ground truth.

Also, I wonder if there is quite heavy quantisation in the normal/depth maps? I think they are stored with 16 bit depth - is that right?

Thank you for bringing this up! Yes, the normal maps are in camera space and not world space. I am under the impression that world space normal maps would be meaningless without camera extrinsics, so I opted to extract/provide them in camera space.

I did the same as you and wrote a script to approximate the normal from the provided ground truth depth. The following GIF shows the input image next to the ground truth normal and the normal approximated from the ground truth depth. Looks good to me, but please correct me if there are any issues with it that I am not aware of.

normal

You can find the script below. The script inverts the Y axis of the normal map since the coordinate system of Unreal differs from what I am using for the approximation. Furthermore, it sets the normal map for the sky (depth greater than 50k) to a well defined value since one would otherwise get the normal of the sphere/box in which the virtual environment resides. As for approximating the normal from the depth, the script uses the cross product of vectors between neighboring points in 3D space. Note that the depth is smoothed with a Gaussian filter to reduce noise, this is a very simple approximation after all.

#!/usr/bin/env python

import cv2
import json
import math
import moviepy
import moviepy.editor
import numpy

##########################################################

npyImages = []

for intSample in range(1, 20):
	npyImage = numpy.ascontiguousarray(cv2.imread(filename=str(intSample).zfill(5) + '-bl-image.png', flags=-1).astype(numpy.float32) * (1.0 / 255.0))
	npyDepth = numpy.ascontiguousarray(cv2.imread(filename=str(intSample).zfill(5) + '-bl-depth.exr', flags=-1)[:, :, None].astype(numpy.float32))
	npyNormal = numpy.ascontiguousarray(cv2.imread(filename=str(intSample).zfill(5) + '-bl-normal.exr', flags=-1).astype(numpy.float32))
	npyNormal[:, :, 1:2] *= -1.0
	npyNormal[:, :, 0:1][npyDepth >= 50000.0] = 0.0
	npyNormal[:, :, 1:2][npyDepth >= 50000.0] = 0.0
	npyNormal[:, :, 2:3][npyDepth >= 50000.0] = -1.0
	npyNormal /= numpy.linalg.norm(npyNormal, 2, 2, True).repeat(3, 2)

	intWidth = npyImage.shape[1]
	intHeight = npyImage.shape[0]
	fltFov = json.loads(open(str(intSample).zfill(5) + '-meta.json', 'r').read())['fltFov']
	fltFocal = 0.5 * max(intWidth, intHeight) * math.tan(math.radians(90.0) - (0.5 * math.radians(fltFov)))

	npyPinholeX = numpy.linspace((-0.5 * intWidth) + 0.5, (0.5 * intWidth) - 0.5, intWidth).reshape(1, intWidth).repeat(intHeight, 0).astype(numpy.float32)[:, :, None] * (1.0 / fltFocal)
	npyPinholeY = numpy.linspace((-0.5 * intHeight) + 0.5, (0.5 * intHeight) - 0.5, intHeight).reshape(intHeight, 1).repeat(intWidth, 1).astype(numpy.float32)[:, :, None] * (1.0 / fltFocal)
	npyPinholeZ = numpy.ones([intHeight, intWidth, 1], numpy.float32)
	npyPoints = cv2.GaussianBlur(src=npyDepth, ksize=(3, 3), sigmaX=0.0, sigmaY=0.0)[:, :, None]
	npyPoints = numpy.concatenate([npyPinholeX * npyPoints, npyPinholeY * npyPoints, npyPinholeZ * npyPoints], 2)
	npyDiffX = numpy.pad(npyPoints, [(0, 0), (1, 0), (0, 0)], 'constant')
	npyDiffX = npyDiffX[:, 1:, :] - npyDiffX[:, :-1, :]
	npyDiffY = numpy.pad(npyPoints, [(1, 0), (0, 0), (0, 0)], 'constant')
	npyDiffY = npyDiffY[1:, :, :] - npyDiffY[:-1, :, :]
	npyApprox = numpy.cross(npyDiffY, npyDiffX, 2)
	npyApprox /= numpy.linalg.norm(npyApprox, 2, 2, True).repeat(3, 2)

	npyImages.append(cv2.resize(src=(numpy.concatenate([npyImage, (npyNormal + 1.0) * 0.5, (npyApprox + 1.0) * 0.5], 1) * 255.0).clip(0.0, 255.0).astype(numpy.uint8), dsize=None, fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA))
# end

moviepy.editor.ImageSequenceClip(sequence=[npyImage[:, :, ::-1] for npyImage in npyImages], fps=5).write_gif('normal.gif')

Thanks again for bringing this up! Closing this issue for now, please let me know in case something is still unclear or there are any issues with my script. Thanks!