Add where_assign
hobu opened this issue · comments
It is not currently possible to use some filters that would otherwise cull data to annotate it instead. I would propose we ad da where_assign
(to follow along with where
and where_merge
) that takes in an AssignExpression that makes an assignment on any points that would be kept by the filter's execution
For example, this question wants to sample points and annotate them with whether or not they were sampled. It's not at all conveniently possible to do this task with PDAL at the moment, but it would be with a where_assign
option on stage types that would otherwise cull:
filters.range
,filters.expression
filters.outlier
(kind of does this now)filters.smrf
and friends kind of do culling but instead labelfilters.lof
,filters.sample
,filters.relaxationdartthrowing
,filters.mad
,filters.iqr
filters.voxel*
filters.decimation
filters.crop
Multiple Sampling Example
This pipeline fragment would assign two newly created dimensions, 50mSamples
and 25mSamples
to 1 if a point were sample selected by the filter:
{
"type": "filters.sample",
"where_assign":"50mSamples == 1",
"radius":50
},
{
"type": "filters.sample",
"where_assign":"25mSamples == 1",
"radius":25
},
Implementation Details
#4282 made it possible to create dimensions in filters.assign
if they did not already exist. That could be adapted plus the AssignmentStatement expression class to adapt the where_merge
code to do an assignment instead of bolting back on the points that were culled.
I'm not sure how this would work. There's no generic way to know which points were "touched" by any stage. The example for filters.sample
seems obvious, but I think it's a special case, though perhaps one shared by other stages. I don't see any problem adding support to assign values to points that meet some criteria in a particular stage, but I don't understand how it would be applied generically given the current variety of stage behavior.
And some of the example filters really just create their own dimensions that could already be used with filters.assign
to achieve the same. For example, computing local outlier factor, but replacing filters.range
with filters.assign
and a WHERE clause. Sure, where_assign
would be a little syntactic sugar for these few cases, but it's not entirely new behavior.
The true culling filters are probably the only stages where this is arguably needed as there is currently no means of not culling in a culling filter. So I'm not entirely opposed to provided a fix for these use cases, but trying to make this broadly available to all stages feels like a bit of an overreach.
Probably the most consistent thing we could do is to eliminate culling filter behavior altogether and always require a separate range/expression filter to truly remove points. I don't know that it's a huge hit to performance, as you'd just be deferring the culling to a later stage. Biggest problem would be education and backwards compatibility.
I am reminded that we originally had extract
and classify
options for the ground filters, which was confusing. We made the choice there to only label/classify and to require a range filter if you really only wanted the ground returns only. Clearly, we can also achieve the same these days with where and expression capabilities.
#4415 implemented the simple thing to filters.sample
. I wonder if any other filters really need the capability