NVIDIA / gvdb-voxels

Sparse volume compute and rendering on NVIDIA GPUs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UCHAR channel needs a 0.5 offset to read correctly

gp42 opened this issue · comments

It seems I cannot understand hot to use UCHAR channels correctly, because i am not able to read the values which i am writing.

Here is an example:

  1. I create a channel using gvdb.AddChannel(0, T_UCHAR, 1)
  2. First kernel rewrites every existing voxel with a new UCHAR value
  3. Second kernel prints out values

Result: I receive incorrect values

However if i read values with a 0.5 offset (which is to my understanding only applicable to FLOAT channel), i get the correct values. See kernel examples below, please help me to understand this.

This happens only if i write values using surf3Dwrite, reading after SolidVoxelize seems to work fine.

This also affects svox reading.

extern "C" __global__ void writeVox (VDBInfo* gvdb, int3 res, uchar chan)
{
  GVDB_COPY_SMEM_UC

  uchar v = tex3D<uchar>( gvdb->volIn[chan], vox.x,vox.y,vox.z);
  if (v == 0) return;

  uchar val = 6;
  surf3Dwrite ( val, gvdb->volOut[chan],vox.x*sizeof(uchar), vox.y, vox.z);
}

extern "C" __global__ void readVox (VDBInfo* gvdb, int3 res, uchar chan)
{
  GVDB_COPY_SMEM_UC

  uchar v = tex3D<uchar>( gvdb->volIn[chan], vox.x, vox.y, vox.z);
  //uchar v = tex3D<uchar>( gvdb->volIn[chan], vox.x+0.5f, vox.y+0.5f, vox.z+0.5f);
  if (v == 0) return;

  printf("%dx%dx%d=%x\n", vox.x, vox.y, vox.z, v);
}

This somehow relates to this question: #29

When you create that channel, try setting the mode to F_POINT. It will disable the linear interpolation access method.
gvdb_.AddChannel(0, T_UCHAR, 1, F_POINT, F_CLAMP);

I've been a little annoyed by the convention in the library (assign using surf3Dwrite and access using tex3D<> with 0.5f offset), especially since there are inconsistencies (e.g. in some raytracing kernels tex3D is used without the offset). My current understanding is that the (implicit) assumption is that only float channels are supposed to be used in linear filtering texture mode and other channels are supposed to be used in point mode.

I haven't figured out a great solution yet, and have been meaning to file an issue and see what @neilbickford-nv thinks about it.

This works like a charm, thank you! 🎉
I wonder if we should reflect this in documentation somehow, it is a pdf though.

Yep, @icoderaven's answer is correct - textures are interpolated as if each voxel/texel's value is specified at the center of the voxel/texel, so when using a texture with linear interpolation you have to add an offset of half a voxel to avoid blending with other voxels. On the other hand, point sampling uses nearest-neighbor interpolation, so you don't have to add this offset (since then it only reads from one voxel, the closest one).

I would like to take a pass over the codebase at some point and make the semantics about when to specify integer-valued voxel indices vs. floating point-valued interpolated indices more explicit. I think the idea of using surf3Dwrite and surf3Dread for lookups without interpolation is also good, but I need to check to make sure there aren't performance implications in doing so (e.g. if they utilize the hardware differently).

It's also interesting that you seem to be getting interpolated values with T_UCHAR textures, since I think we should be creating that using CU_TRSF_READ_AS_INTEGER - I'll need to look into that a bit more.

Hi all - I've just pushed a commit (6606bde) that replaces integer and half-voxel-offset tex3D reads with surf2Dread calls, except in getTricubic in cuda_gvdb_raycast,cuh (as this function can sample values outside of the atlas, and while tex3D handles this without error, surf2Dread does not without further modification).

This seems to improve performance slightly, which is nice (probably as a result no longer requiring the texture unit to filter samples); on a version of gSprayDeposit modified to smooth the level set every frame, this changes the GPU time on my laptop in the simulate routine from a median of 31.944 ms to a median of 30.514 ms (about 4% faster).

It seems to work on all of the GVDB samples; please let me know if it breaks anything on your own projects!

Closing, since I think this should be fixed now and June's commit seems to have not created new issues. Thanks!