Add where_assign

Question

Add where_assign

hobu opened this issue 2 months ago · comments

It is not currently possible to use some filters that would otherwise cull data to annotate it instead. I would propose we ad da where_assign (to follow along with where and where_merge) that takes in an AssignExpression that makes an assignment on any points that would be kept by the filter's execution

For example, this question wants to sample points and annotate them with whether or not they were sampled. It's not at all conveniently possible to do this task with PDAL at the moment, but it would be with a where_assign option on stage types that would otherwise cull:

filters.range, filters.expression
filters.outlier (kind of does this now)
filters.smrf and friends kind of do culling but instead label
filters.lof , filters.sample , filters.relaxationdartthrowing, filters.mad, filters.iqr
filters.voxel*
filters.decimation
filters.crop

Multiple Sampling Example

This pipeline fragment would assign two newly created dimensions, 50mSamples and 25mSamples to 1 if a point were sample selected by the filter:

    {
      "type": "filters.sample",
      "where_assign":"50mSamples == 1",
      "radius":50
    },
    {
      "type": "filters.sample",
      "where_assign":"25mSamples == 1",
      "radius":25
    },

Implementation Details

#4282 made it possible to create dimensions in filters.assign if they did not already exist. That could be adapted plus the AssignmentStatement expression class to adapt the where_merge code to do an assignment instead of bolting back on the points that were culled.

Andrew Bell · Answer 1 · Fri May 31 2024 20:08:04 GMT+0800 (China Standard Time)

I'm not sure how this would work. There's no generic way to know which points were "touched" by any stage. The example for filters.sample seems obvious, but I think it's a special case, though perhaps one shared by other stages. I don't see any problem adding support to assign values to points that meet some criteria in a particular stage, but I don't understand how it would be applied generically given the current variety of stage behavior.

chambbj · Answer 2 · Fri May 31 2024 20:47:13 GMT+0800 (China Standard Time)

And some of the example filters really just create their own dimensions that could already be used with filters.assign to achieve the same. For example, computing local outlier factor, but replacing filters.range with filters.assign and a WHERE clause. Sure, where_assign would be a little syntactic sugar for these few cases, but it's not entirely new behavior.

The true culling filters are probably the only stages where this is arguably needed as there is currently no means of not culling in a culling filter. So I'm not entirely opposed to provided a fix for these use cases, but trying to make this broadly available to all stages feels like a bit of an overreach.

chambbj · Answer 3 · Fri May 31 2024 20:52:34 GMT+0800 (China Standard Time)

Probably the most consistent thing we could do is to eliminate culling filter behavior altogether and always require a separate range/expression filter to truly remove points. I don't know that it's a huge hit to performance, as you'd just be deferring the culling to a later stage. Biggest problem would be education and backwards compatibility.

I am reminded that we originally had extract and classify options for the ground filters, which was confusing. We made the choice there to only label/classify and to require a range filter if you really only wanted the ground returns only. Clearly, we can also achieve the same these days with where and expression capabilities.

Howard Butler · Answer 4 · Mon Jun 03 2024 21:23:15 GMT+0800 (China Standard Time)

#4415 implemented the simple thing to filters.sample. I wonder if any other filters really need the capability