gafniguy / 4D-Facial-Avatars

Dynamic Neural Radiance Fields for Monocular 4D Facial Avater Reconstruction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What boundary values did you use for volume rendering?

bennyguo opened this issue · comments

Hi,

I'm trying to re-implement this work. Could you please kindly share the boundary values you used for volume rendering? a.k.a. the "near" and "far" value in most of the NeRF codebases.

Thanks!

Sure. Near was 0.2, far was 1.0. Make sure you normalize the scene (i.e the output of the face tracker). I normalized it such that the average (over the sequence) z-val from the camera is 0.5. This way the geometry stays between [-1,1] which is necessary for the positional encoding to work properly.

Here is the configuration I used:
Also notice the NDC, lindisp settings... This config matches the implementation of nerf by Krishna Murthy.

# Parameters to setup experiment.
experiment:
  # Unique experiment identifier
  id: dave__fixed_bg_512_paper_model
  # Experiment logs will be stored at "logdir"/"id"
  logdir: /rdata/guygafni/projects/cnerf/nerf-pytorch/logs/dvp
  # Seed for random number generators (for repeatability).
  randomseed: 42  # Cause, why not?
  # Number of training iterations.
  train_iters: 1000000
  # Number of training iterations after which to validate.
  validate_every: 1000
  # Number of training iterations after which to checkpoint.
  save_every: 5000
  # Number of training iterations after which to print progress.
  print_every: 100
  device: 0

# Dataset parameters.
dataset:
  # Type of dataset (Blender vs LLFF vs DeepVoxels vs something else)
  type: blender
  # Base directory of dataset.
  basedir: /rdata/guygafni/projects/cnerf/nerf-pytorch/real_data/dave_dvp
  #basedir: real_data/andrei_1_light
  #basedir: real_data/debug
  # Optionally, provide a path to the pre-cached dataset dir. This
  # overrides the other dataset options.
  #cachedir: cache/flame_sample
  # For the Blender datasets (synthetic), optionally return images
  # at half the original resolution of 800 x 800, to save space.
  half_res: True
  # Stride (include one per "testskip" images in the dataset).
  testskip: 1
  # Do not use NDC (normalized device coordinates). Usually True for
  # synthetic (Blender) datasets.
  no_ndc: True
  # Near clip plane (clip all depth values closer than this threshold).
  near: 0.2
  # Far clip plane (clip all depth values farther than this threshold).
  far: 0.8

# Model parameters.
models:
  # Coarse model.
  coarse:
    # Name of the torch.nn.Module class that implements the model.
    type: ConditionalBlendshapePaperNeRFModel
    # Number of layers in the model.
    num_layers: 4 # ignore this, I hard coded the model
    # Number of hidden units in each layer of the MLP (multi-layer
    # perceptron).
    hidden_size: 256
    # Add a skip connection once in a while. Note: This parameter
    # won't take affect unless num_layers > skip_connect_every.
    skip_connect_every: 3
    # Whether to include the position (xyz) itself in its positional
    # encoding.
    include_input_xyz: True
    # Whether or not to perform log sampling in the positional encoding
    # of the coordinates.
    log_sampling_xyz: True
    # Number of encoding functions to use in the positional encoding
    # of the coordinates.
    num_encoding_fn_xyz: 10
    # Additionally use viewing directions as input.
    use_viewdirs: True
    # Whether to include the direction itself in its positional encoding.
    include_input_dir: False
    # Number of encoding functions to use in the positional encoding
    # of the direction.
    num_encoding_fn_dir: 4
    # Whether or not to perform log sampling in the positional encoding
    # of the direction.
    log_sampling_dir: True
  # Fine model.
  fine:
    # Name of the torch.nn.Module class that implements the model.
    type: ConditionalBlendshapePaperNeRFModel
    # Number of layers in the model.
    num_layers: 4 # ignore this, I hard coded the model
    # Number of hidden units in each layer of the MLP (multi-layer
    # perceptron).
    hidden_size: 256
    # Add a skip connection once in a while. Note: This parameter
    # won't take affect unless num_layers > skip_connect_every.
    skip_connect_every: 3
    # Number of encoding functions to use in the positional encoding
    # of the coordinates.
    num_encoding_fn_xyz: 10
    # Whether to include the position (xyz) itself in its positional
    # encoding.
    include_input_xyz: True
    # Whether or not to perform log sampling in the positional encoding
    # of the coordinates.
    log_sampling_xyz: True
    # Additionally use viewing directions as input.
    use_viewdirs: True
    # Whether to include the direction itself in its positional encoding.
    include_input_dir: False
    # Number of encoding functions to use in the positional encoding of
    # the direction.
    num_encoding_fn_dir: 4
    # Whether or not to perform log sampling in the positional encoding
    # of the direction.
    log_sampling_dir: True

# Optimizer params.
optimizer:
  # Name of the torch.optim class used for optimization.
  type: Adam
  # Learning rate.
  lr: 5.0E-4

# Learning rate schedule.
scheduler:
  # Exponentially decay learning rate (in 1000 steps)
  lr_decay: 250
  # Rate at which to apply this decay.
  lr_decay_factor: 0.1

# NeRF parameters.
nerf:
  # Use viewing directions as input, in addition to the X, Y, Z coordinates.
  use_viewdirs: True
  # Encoding function for position (X, Y, Z).
  encode_position_fn: positional_encoding
  # Encoding function for ray direction (theta, phi).
  encode_direction_fn: positional_encoding
  # Training-specific parameters.
  train:
    # Number of random rays to retain from each image.
    # These sampled rays are used for training, and the others are discarded.
    num_random_rays: 2048  # 32 * 32 * 4 # was 1024
    # Size of each chunk (rays are batched into "chunks" and passed through
    # Size of each chunk (rays are batched into "chunks" and passed through
    # the network)
    chunksize: 2048 #16384  #131072  # 131072  # 1024 * 32
    # Whether or not to perturb the sampled depth values.
    perturb: True
    # Number of depth samples per ray for the coarse network.
    num_coarse: 64
    # Number of depth samples per ray for the fine network.
    num_fine: 64
    # Whether to render models using a white background.
    white_background: False
    # Standard deviation of noise to be added to the radiance field when
    # performing volume rendering.
    radiance_field_noise_std: 0.1
    # Sample linearly in disparity space, as opposed to in depth space.
    lindisp: False
  # Validation-specific parameters.
  validation:
    # Number of random rays to retain from each image.
    # These sampled rays are used for training, and the others are discarded.
    chunksize: 65536 #4096  #131072   # 1024 * 32
    # Whether or not to perturb the sampled depth values.
    perturb: True
    # Number of depth samples per ray for the coarse network.
    num_coarse: 64
    # Number of depth samples per ray for the fine network.
    num_fine: 64
    # Whether to render models using a white background.
    white_background: False
    # Standard deviation of noise to be added to the radiance field when
    # performing volume rendering.
    radiance_field_noise_std: 0.
    # Sample linearly in disparity space, as opposed to in depth space.
    lindisp: False

This is great! Thanks!
By the way, are the values provided in the evaluation kit (dave) normalized? or I have to normalize them myself?

already normalized

Thanks!

Hi, guy

Can you tell me how to norm the pose in detail?
Thanks!

Hi, guy

Can you tell me how to norm the pose in detail?
Thanks!

I take all the rigid poses [R|t] of the head from the tracking all along the sequence, and find the average z value of the translation
Attaching a snippet of the code that read the data from the face tracker, and was used to write my jsons.

def read_rigid_poses(path_to_rigid_poses_txt, mean_scale=True):
    all_rigids = np.genfromtxt(path_to_rigid_poses_txt, dtype=None)
    all_rigids = all_rigids.reshape(-1,4,4)
    all_rigids[:,:,0] *= -1 # this was to change the coordinate system coming from our face tracker to match PyRender one.
    all_rigids[:,:,2] *= -1 # this was to change the coordinate system coming from our face tracker to match PyRender one.
    scale = 0.5/np.mean(all_rigids[:,2,-1]) # mean z_val # Scaling such that the camera is at z = ~0.5
    if mean_scale:
        print("scaling scene by %f" % scale)
        all_rigids[:,0:3,-1] *= scale
    print("rigids shape ", all_rigids.shape)
    return all_rigids, scale

Hi, guy
Can you tell me how to norm the pose in detail?
Thanks!

I take all the rigid poses [R|t] of the head from the tracking all along the sequence, and find the average z value of the translation
Attaching a snippet of the code that read the data from the face tracker, and was used to write my jsons.

def read_rigid_poses(path_to_rigid_poses_txt, mean_scale=True):
    all_rigids = np.genfromtxt(path_to_rigid_poses_txt, dtype=None)
    all_rigids = all_rigids.reshape(-1,4,4)
    all_rigids[:,:,0] *= -1 # this was to change the coordinate system coming from our face tracker to match PyRender one.
    all_rigids[:,:,2] *= -1 # this was to change the coordinate system coming from our face tracker to match PyRender one.
    scale = 0.5/np.mean(all_rigids[:,2,-1]) # mean z_val # Scaling such that the camera is at z = ~0.5
    if mean_scale:
        print("scaling scene by %f" % scale)
        all_rigids[:,0:3,-1] *= scale
    print("rigids shape ", all_rigids.shape)
    return all_rigids, scale

Thanks, you are so kind.