G-Node / nix

Neuroscience information exchange format

Home Page:https://readthedocs.org/projects/nixio/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tag data retrieval specification

achilleas-k opened this issue · comments

We need to clarify and specify unambiguously how data retrieval works for Tag and MultiTag referenced data. I'm going to ignore Untagged and Indexed features for now, because I think these are simple and clear. So this refers to referenced data and the Tagged features which behave equivalently.

In all the examples below, I'll be specifying positions, extents, and timestamps. Assume for simplicity that timestamps is a range dimension on the referenced data (or a data array with alias range dimension) with the same unit as the positions and extents.

Currently there's a bug where data is returned even when the tag region should be empty. Consider the following example:

positions = [0]
extents = [10]
timestamps = [22, 23, 31]

The first tag index should return an empty DataView but instead returns [22]. This seems to be because we first check if there's an index/tick that's greater or equal than the positions, to set it as a start index. This is fine when there's at least one value to return (the first value returned should be at an index that's >= to the start position), but should not apply when the end position (position + extent) is less than the first available index or tick.

I also want to clarify that at some point we decided that the end position should be inclusive. Is this correct? For example:

positions = [10]
extents = [10]
timestamps = [10, 12, 19, 20, 23]
  • Inclusive returns: [10, 12, 19, 20]
  • Non-inclusive returns: [10, 12, 19]

Currently we decided on inclusive but I'm wondering if we would like to revisit this. I think the inclusive slicing is a little unexpected, at least in certain cases. For instance:

positions = [0, 10, 20, 30]
extents = [10, 10, 10, 10]
timestamps = [0, 1, 2, 3, 4, ..., 39]

I think many would expect that retrieving the slices defined by each tag would have no overlap.

@achilleas-k,

I found a bug "on the other side" of MultiTag retrieval. The last timestamp of Range (or AliasRange) dimension is returned for all following positions when they should be empty. For example, 8 will be returned for all positions when using the following code:
positions = [0 10 20] extents = [10 10 10] timestamps = [8]

Non-inclusive extents make more sense to me. From my perspective, it seems intuitive to expect that an extent of 1 should return a segment of length 1. Another point supporting non-inclusive extents in python was mentioned in #802. It is equally important is to have a consistent behavior for both MATLAB and Python APIs. Currently, due to different implementation between the APIs, reading a MATLAB-created NIX file with Python API causes OutOfBounds error.

the behaviour you describe @Seafad can be safely regarded as a bug... sorry.
A workaround for the moment is to read the ticks and apply position and extent in Matlab/Numpy via logical indexing.
Resolving this issue might involve some API changes which is not nice... still thinking about it