hed-standard / hed-python

Python validation, summary, and analysis tools for HED (Hierarchical Event Descriptors).

Home Page:https://hed-python.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sidecar validation

happy5214 opened this issue · comments

Over the past couple of weeks, @VisLab and I have discussed the procedure for validating sidecars as part of BIDS datasets. Here is the plan we came up with, slightly modified based on some thoughts I've had since then:

  • Parse sidecars.
  • Run syntax-only validation on all event sidecars, exiting on failure.
  • Per TSV:
    • Find sidecars for that TSV, based on filename.
    • Parse TSV.
    • Run syntax-only validation on HED TSV column, skipping to next TSV on failure.
    • Parse and validate definitions.
    • Run semantic validation on sidecars and HED column, skipping to next TSV on failure of this or definition validation.
    • Substitute sidecar columns.
    • Run dataset validation on TSV.

In bullet item 2,

I think that there has to be more than syntax-only validation on all side cars. I think it should actually check that tags that are not library tags are in the base schema. This will cover 99.99% of the errors.

I am wondering whether we shouldn't try to come up with a different way of declaring library schema so that we don't have to have them embedded in the files. I don't have any ideas on this, but it would make things a lot easier..... thoughts?

Since it appears we're going to be passing in the library schemas with the base schema version and not as definition tags (per hed-standard/hed-specification#156), the separate passes are no longer necessary. Therefore, the new proposal is this:

  • Parse sidecars.
  • Run string-level validation on all event sidecars, exiting on any errors (not warnings).
  • Per TSV:
    • Find sidecars for that TSV, based on filename.
    • Parse TSV.
    • Run string-level validation on HED TSV column, skipping to next TSV on failure.
    • Substitute sidecar columns.
    • Run event- and dataset-level validation (including definitions) on TSV.

A few thoughts/questions:

For sidecars: Does string-level validation includes handling # placeholders? This problem is complicated by placeholders in definitions as explained below.

Definitions: How do definitions get incorporated into this? Would the parser recognize Def/XXX and Def-expand/YYY as defined names. Would the parser return a dictionary of definitions as well as a list of Def/XXX and Def-expand/YYY's that appear.

In order to conveniently use definitions with placeholders, the specification is going to allow the following:

A definition tag-group can contain at most one # placeholder. If it has a #, then when it is used, a value must be supplied:

Example: Def/XXX/1 Hz

My original thought was to postpone supporting this until later, but when we have worked out practical tagging examples, this appears to be very convenient and useful. Since implementing it may change the way some parsing is done, we should probably think about it now.

@IanCa @happy5214

Can you both confirm that we have finalized our strategy (both python and javascript) and that this issue can be closed?