Improvements to solve ambiguities in the specification

Question

Improvements to solve ambiguities in the specification

tferr opened this issue 2 years ago · comments

@lydiang, A couple of notes I was unable to raise earlier today.

Spatial Calibration:
While it makes perfect sense (at least from a file-reader point-of-view) to always assume physical units, the reality in the ecosystem of SWC writers is an unexpected mess: Some software adopts pixel coordinates, some don't. Without standardized metadata is impossible to know what are the actual dimensions of the tree without having access to the original imagery. [Case in point: one can only import the DIADEM datasets properly, after reading the companion documentation. From the SWC files alone, there is no way to know that the coordinates need to be scaled by an externally-declared (anisotropic) voxel size].

A common request we get from SNT users is how to scale a neuronal arbor after it has been traced in pixel coordinates. Many years ago we could at least assume integer coordinates would encode pixel coordinates, but not even that can be used when sub-pixel segmentation routines are adopted. I think the only way to really solve this in a way that is backward- compatible is to have standardized metadata in the file header that specifies physical units and voxel spacing (as mentioned by @hanchuan). The specification could then assume micrometer coordinates if such metadata was not found.

Backwards Compatibility:
(this may be more directed to Giorgio and @bengtl).

I find the mapping of TypeIDs 5 and 6 to 'custom' and 'unspecified' surprising. They have been listed in the Neuroland specification as a possibility for 'fork point' and 'end point' for many years. There are several tools that use that convention (it speeds up significantly import of large files, since the entire structure can be parsed in a single pass). If this is to become the authoritarian specification, this should be at least mentioned, as it will break some expectations.

Future-proofing:

Soma definition: the format expectation is that neuronal arbors are rooted on the soma, correct? There are several examples (many in invertebrate cells, and e.g. some interneurons in the mouse brain, in which the axon initializes from a primary dendrite. Currently, I don't think most readers handle this well. Should this be documented?
Is support for annotations (spines, varicosities, buttons, etc.) planned (e.g., incorporate eSWC, SWC+, etc. proposals)?
I agree with the version tagging proposed in #2, which again would probably require standardized metadata in the header. An accessible SWC validator would be key.

Clarifications:

What is the difference between TypeID 0 (undefined) and TypeID 6 (unspecified)?
Z coordinates of 2D reconstructions/Radius of skeletonized centerlines (deprived of de facto radii): In the wild such files list either constant values of 0 or 1 (the latter presumably from using pixel coordinates). Should this be standardized?
Most SWCs we come across use combinations of white-space to improve readability (ie, tabs, or multiple spaces that align columns vertically when opened on a text editor). What is the official stand on this: Are these allowed? Tolerated? Forbidden?

Bengt Ljungquist · Answer 1 · Sat Apr 23 2022 06:43:28 GMT+0800 (China Standard Time)

Thank you for your detailed feedback Tiago, please see my and Giorgio's reply to each issue below:

Re: Spatial Calibration:
The standard specifies that units always are in microns. This is indeed important from the file reader viewpoint, as you noted, to ensure a coherent scientific interpretation for analysis and modeling. In practice, however, many reconstructions are first expressed in voxels since that’s how image stacks are traced. This information is also useful for the purpose of matching the tracings back to the image stacks when needed. We fully agree with your suggestion that the only way to really solve the impasse is to include metadata in the file header that specifies voxel spacing and physical units. An example of such a specification could be (as taken from a recent SNT-exported swc file):
# Voxel separation (x,y,z): 0.553384765625, 0.553384765625, 1.0
It is important that the standard clarifies that these values are what one needs to multiply the swc coordinates (which are in microns) to obtain a transformed set of coordinates expressed in voxels.

Re: Backwards Compatibility:
Agreed that we should mention this. Although Neuronland acknowledged the possibility to specify bifurcations and terminations by type, only NeuronStudio (now no longer supported) and a few other tools followed this convention. The majority of tracing and conversion tools (including Neuronland itself!) does not. Although as you noted adopting this variant has some advantages in terms of parsing speed, the disadvantage is that it changes the semantics of TypeIDs from structural domains (dendrites, axons, glial processes etc), which cannot be easily computed, to topological information which is redundant. That said, it is a good idea to mention this explicitly.

Re: Future-proofing:

There is no format expectation for the tree to be rooted in the soma. Indeed it is not uncommon for the axon to stem from a proximal dendrite.
We are not planning to put anything more in the first version standard to keep it simple and encourage adoption. The governing board will most likely revisit and address this later.
The current version is now tagged as v1.0.0

Re: Clarifications:

Undefined means that it could be anything, including soma. This typically denotes that this value needs to be corrected and defined. Unspecified refers specifically to neurites, which could be dendrites or axons. Early in development or in certain invertebrates the neurite may be undifferentiated. Even in mature vertebrate neurons, due to experimental limitations, it is not always possible to tell axons and dendrites apart. In these cases, use of typeID 6 is appropriate, and does not need to be corrected. We will add this clarification.
As a value of 0 may cause issues, we suggest setting a constant value in the lower range (between 0 and 1, excluding 0), for such files. In NeuroMorpho.Org for example, such files currently are assigned the constant radius of 0.125. We will add this clarification as well.
At least one space, " ", should be present. Additional white-space characters are allowed.

Tiago Ferreira · Answer 2 · Sat May 07 2022 06:14:56 GMT+0800 (China Standard Time)

Thanks for the detailed reply @bengtl !
On the rooted soma: My comment stemmed from this sentence "The soma in SWC can consist of a single root point or multiple points, the first of which is the root. " The first time I read it, I thought it was referring to the root of the structure. But now I see that, it means 'root' of the soma collection of points. Maybe worth it to clarify this explicitly?

On the radius of 0: I would actually prefer a clear definition of what defines 'absence of radii'. In my opinion, it is better to handle the idiosyncrasies of 0 (or -1), than to adopt that 0.125 strategy. It is just hard to justify, and 125nm is actually quite common to find in real reconstructions.

Bengt Ljungquist · Answer 3 · Fri May 13 2022 05:18:58 GMT+0800 (China Standard Time)

Thank you again, @tferr for your helpful and constructive feedback.

Re: Rooted soma
We suggest the following clarification, in the section "Soma representation":
Current wording
"The soma in SWC can consist of a single root point or multiple points, the first of which is the root"
will be changed into
"The soma in SWC can consist of a single center-point or multiple points, the first of which is the point of origin. Other types of structures may occur before the soma in the file and serve as root in the file tree structure."

Re: Constant radius
While we agree that it would be beneficial to denote the presence of a constant or absent radius in some form, we have, after further consideration and discussion with members of the community, arrived at the conclusion that we cannot accept a value of 0 or -1, as it would knowingly cause problems for several established tools and applications, and potentially some more unforeseen. We think it is more suitable to denote it as part of the metadata. For the initial version of the standard, we will however not specify any further guidance on how it may be expressed. We will add it as an open topic to discuss for coming versions.