labstreaminglayer / App-LabRecorder

An application for streaming one or more LSL streams to disk in XDF file format.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Store datetime of recording

cbrnr opened this issue · comments

Following up the discussion in sccn/labstreaminglayer#28, I think it would be a good idea to store the time and date of the recording somewhere in the file (i.e. the date and time when the file was created/when the inlet was created). The best place is probably the file header chunk, but this is open for discussion.

Here is the block where the fileheader is written:

// [FileHeader] chunk
	_write_chunk(
		chunk_tag_t::fileheader, "<?xml version=\"1.0\"?><info><version>1.0</version></info>");

It should be pretty easy to augment the <info> tag with <created_at_uts> or similar.

Here is the XDF specs for the fileheader-chunk:
https://github.com/sccn/xdf/wiki/Specifications#fileheader-chunk

Now the question is how to get unix timestamps in a cross-platform way?

Here is how pyc does it: https://hg.python.org/cpython/file/v3.5.2/Python/pytime.c#l451

And here is the commit that added something similar to libuv. Notice that they had different implementation files for Windows and Unix. This complicates the build system somewhat but is better than #ifdef which can be harder to maintain and diagnose.

We're already using XML, so what about storing it as a string? Something like 2020-06-17 17:15:23 UTC (possibly augmented with fractions of a second)? Also, this would be user-readable.

Furthermore, should the XDF standard be extended to include this tag in the header (i.e. the next XDF version)?

Though I can't find an authoritative source, I've read a couple places that C++20 enforces time_­since_­epoch to refer to unix epoch, even on MacOS. Can we limit LabRecorder builds on MacOS to C++20? What do we leave behind?

what about storing it as a string?

That's probably fine because it doesn't have to be high resolution. We could go with system_clock which is easier to work with. I don't know the exact incantation to get a datetime str in UTC, but this example looks workable. Edit: instead of localtime I would use gmtime.

should the XDF standard be extended to include this tag in the header

Are you suggesting making it a requirement? The only benefit I see is that a future update to an xdf loader could simply look for the xdf version number (guaranteed to be there) and then from the version number it will know whether or not the datetime string is present.
As it's all XML anyway, I'm pretty sure any xdf loader could just as easily parse the entire header then check for the presence of the key.
So IMO the benefit is really really minimal, and not worth the grumbling we might get for bumping the version number.

So it looks like a string could be a workable solution 👍.

Regarding adding that to the XDF specs, the initial version has now been around for about 9 years without any changes. I think adding the recording time to the file header would justify bumping the version number, even if this is a really minimal and trivial change. The point is that we want to avoid a hodgepodge of tags that people might be adding out of convenience - that's also why we designed XDF as a (community) standard. And yes, you're right that old readers that only support version 1.0 would simply skip the new field (therefore this is a backward-compatible change). However, readers supporting the new version (1.1?) would have to be able to parse the field correctly.

I'm not involved, but I know there are efforts by others to get XDF into a standards body. It would be nice to keep these things in sync.

Also, Tristan has had some major improvements waiting on the sideline that would be great to get them into the standard too (maybe even a v2.0).

In terms of politics and path of least resistance, is it better to make incremental changes? Or is it better to make one more complete change that has a little something for everyone? I don't know, and I don't really want to think about it right now. So I'll just punt the issue down the road.

But the discussion of whether or not we need to change the version shouldn't stop the created_at datetime str from being added in LabRecorder because it fits within the current spec.

Sounds good, bumping the version for such a tiny incremental change feels weird. But I agree that this is really orthogonal to including the field in LabRecorder right now, so we should discuss the XDF spec change elsewhere (the XDF repo I guess is the best place).

I am not aware of any effort to standardize XDF - do you have more details? I'm not really interested in such stuff because this seems mainly interesting for companies and involves a lot of legal issues. What I care about is to keep pushing XDF (standardized or not), and one option would be to get it into BIDS.

Can we limit LabRecorder builds on MacOS to C++20? What do we leave behind?

I'm even hesistant to require C++17.

We're already using XML, so what about storing it as a string? Something like 2020-06-17 17:15:23 UTC (possibly augmented with fractions of a second)?

It's a standard format:

> options(digits.secs = 5)
> as.POSIXct('2020-06-17 17:15:23.123 UTC')
[1] "2020-06-17 17:15:23.122 CEST"

Adding the fractional seconds might make it harder to parse with default settings and lead someone to use it as base for timestamps so I'd rather not include the fractions.

As for the version, I'd add it now as unofficial extension (it's the X in XML after all) and try to get it in a later standards revision.

I'd also try to stay as downward compatible as possible.

Adding the fractional seconds might make it harder to parse with default settings and lead someone to use it as base for timestamps so I'd rather not include the fractions.

Trust me, that's exactly what people will be doing. I'm fine with leaving out the fractional seconds for now, because it doesn't make a difference when parsing the datetime.

As for the version, I'd add it now as unofficial extension (it's the X in XML after all) and try to get it in a later standards revision.

Where exactly does this happen? I'd like to at least follow the discussion.

Trust me, that's exactly what people will be doing.

I know, I was doing it only last week with data I had reliable hardware triggers for and stopped doing it immediately after plotting the time difference.

Thanks. This document is behind a paywall so I'll assume it has nothing to do with the open XDF format (other than maybe share the same specifications as of XDF 1.0).

I'm not familiar with the contents but that's my understanding too. I think it's basically a signal to industry, "If a few of you use this then you can interoperate with eachother." And by making it an industry standard, they can be sure that if they make a product that uses this, they don't have to worry about random academics changing the standard and breaking their product without notice.

So why do we, as academics working with an open format, care about an industry standard? I don't see how the industry standard benefits my research directly. In my opinion, the direct benefit is really to the community, students and hobbyists alike, who may want to hook up our tools to their industry-supplied file. And this benefits us academics indirectly by getting more users supplying apps, making suggestions, identifying failure points, making tutorials, and making other tools we can use.

We can retain these benefits if we keep everything backwards compatible. It's not too hard to keep tools backwards compatible with older file versions. I think it's a bit harder to keep the new files compatible with older tools or industry tools working with the standard.

But there are changes in the pipeline that will make it a version bump absolutely necessary, and we'll need a campaign to tell people to update their importers if they want to use new files. Not me, but others are taking steps to be able to offer more support for LSL and XDF. Maybe once that's in place it'll be a good time to make larger changes; until then we can limit changes to LabRecorder to those that fit within the current XDF spec.

Maybe I missed something, but I'm not sure that I understand why there is a compatibility problem here. If the date of file creation is recorded in the file header, as cboulay suggests, then the reader needs to handle it properly. But, if the reader can't do this, it isn't a 'good' reader, so who cares?

As of now XDF files almost always have

<?xml version=\"1.0\"?><info><version>1.0</version></info>

as the file header. This is true, for example, of any XDF written by any version of LabRecorder that has ever been hosted in this repository. Since it is XML, one could simply add a snippet to this file header like:

<date>
  <day>2020-06-24</day>
  <time>10:49:23</time>
  <timezone>UTC+10</timezone>
</date>

or whatever. Then, so long as the parser does the right thing by checking the length of the file header (encoded in XDF by definition) and either parsing or ignoring the number of bytes-2 in that length (what load_xdf.m does, for example), then there is no compatibility issue. Again, if a reader doesn't do this, it is poorly crafted and shouldn't be used.

Or, if you do not trust the reader to get the specification right, why not just put the creation date (optionally) in the stream info for each stream? That way it won't confuse any readers that were fool enough to hard-code their way through the file header portion. If a reader isn't built to handle anything in the stream-meta data portion, it won't read any XDF file correctly, so, again, who cares?

I have not seen the Attuned specification, but I think cboulay is exactly right that it was meant as a signal to industry that they can trust in this format. I would be rather surprised, actually, if there were meaningful discrepancies between it and the current XDF spec, but that is just an assumption---I have not read it.

It certainly benefits researchers directly to have industry using LSL and XDF because it will result in more, better, available (for a price) tools that researchers can use out of the box. Also, getting industry invested (in the sense that the specifications/standards are getting adopted) can result in more funding opportunities for academic research projects---at least this is very much the case in the European Union.

Finally, I don't see any need for sub-second precision as this time information is not precise enough to co-register with LSL timestamps anyway. Also, it is not known what the latency between this date and the time the inlet data begins to be written to file so this information is not appropriate for synchronization (as we discussed at length in the previous issue...).

Maybe I missed something

I don't think so. We all agree this change should make its way into LabRecorder and is compatible with current XDF spec.

The conversation between me and Clemens was more about whether or not to bump the XDF (minor) version number and make this field a requirement.

BIDS has the notion of 'required' and 'recommended' attributes explicitly set out in the specification. With LSL/XDF this is a little less clear. 'Required' fields in XDF (for streams) are implicitly dictated by the LSL protocol (name, type, sampling rate, etc.) and the best practices (info.desc.channels.channel[n].lable/unit/ etc.) are said to be 'recommended' only in terms of how to be structured. Does Attuned do something similar?

Or, if you do not trust the reader to get the specification right, why not just put the creation date (optionally) in the stream info for each stream?

A file creation time is independent of each stream's start time (could be running for hours before LabRecorder turns on). They offer unique information and we should provide both. How to do that (app must provide it? or liblsl injects it?) is a different conversation.

But the issues are linked though becase we should attempt to ensure that the datetime string format is the same in both the stream info and the xdf header. Are there any constraints on any platform that would prevent liblsl from getting a proper datetime?

I am more and more confident that I don't understand what we are talking about. If the idea is to have a ground-truth time for when a file is created and/or when an inlet starts writing to file that can somehow be co-registered against timestamps, this is a red herring.

I understand why you want a date in an XDF file. It is useful information. The OS timestamps file creation times, but if you copy it to another machine, this gets bashed out, so you want to keep that date in the file contents. OK, that makes sense.

But, to chime in on your question, I think Clemens' suggestion is quite good. In fact, I would push for the example pattern I used before:

<date>
  <day>yyyy-mm-dd</day>
  <time>hh:mm:ss</time>
  <timezone>UTC+hh</timezone>
</date>

This should be no problem in any OS, although there will have to be some #ifdef WIN32 s in there, for sure.

The current LSL protocol doesn't make such a pattern at stream outlet creation time, so the scenario where LabRecorder isn't started until long after stream outlet creation isn't supportable without some effort and (definitely) a version bump in liblsl and the lsl protocol.

Maybe this is what I don't understand. Are we talking about recording (i.e. XDF specification)? or changing the information shared between outlet and inlet in the UDP handshake (i.e. LSL protocol)?

Maybe this is what I don't understand. Are we talking about recording (i.e. XDF specification)? or changing the information shared between outlet and inlet in the UDP handshake (i.e. LSL protocol)?

Just datetime strings for information purposes only. Definitely not for synchronization.

First, yes let's get the datetime string into the XML header in the XDF file. As you mentioned, there are obvious benefits to this. I can't do it right now. I can comment in GitHub while my computer runs analyses, but not much more. If I get past these analyses and no one else has done it then I will tackle it.

Second, let's think about whether or not we want to be generous and provide datetime info for any stream, and how we would do that. I'm not thinking of it as being mandatory and certainly it doesn't enable any functionality like synchronization, It'll just be a piece of trivia. "How long has the stream been up?" I was thinking we could have liblsl automatically augment the user-provided stream info. Or maybe we just do it in the template app and a few of our apps and leave it at that.

I think the more important question to address first is how do we make sure users won't use it for synchronization? Is cutting off the number at the seconds integer enough? Should the key be "approx-datetime"? Anyway, this isn't very pressing. It's only worth thinking about now because we'd like whatever approach we take here to output the info in the same format as that for the file in the xdf header. Not so they can be used with eachother, just so we can be consistent and so developers importing xdf files and parsing file headers and stream headers can use the same code to parse the datetime.

By the way, I don't know if I'm unique in this, but when working with datetimes I think of "date" as yyyy-mm-dd and "day" as e.g., Mon or Tue. So I'm not in complete agreement with the proposed pattern, but I'll gladly change my mind if there is some widely accepted standard.

I'd tackle just the first issue (including the recording time in the file header, i.e. the time when the inlet was created and started saving whatever streams it was subscribed to) for now because that's a low-hanging fruit. We don't even have to bump the XDF version number right now, but later we should make sure to include this in the next iteration of the format.

Regarding the format, I'd prefer one simple string field instead of a tag with three nested fields:

<date>2020-06-17 17:15:23.123 UTC</date>

We can discuss the tag name of course (maybe we should call it created_at just like the fields in the stream headers, because it tells you when the file was created).

This string can be directly parsed by various function, e.g. in Python you would do:

datetime.strptime("2020-06-17 17:15:23.123 UTC", "%Y-%m-%d %H:%M:%S.%f %Z")

Are there any constraints on any platform that would prevent liblsl from getting a proper datetime?

A device might not have an RTC or a battery for it (e.g. the Raspberry Pi and various other embedded platforms).

Regarding the format, I'd prefer one simple string field instead of a tag with three nested fields:

Fully agreed, and while we're at it we might even use a standard format.

IMHO, the best approach would be a single, optional ISO 8601 formatted field in the file header now that's made optional by a later XDF standard version.
For some clinical applications, you can't have personally identifiable information in there so including it shouldn't be mandatory.

That's fine with me. Thanks for taking this on cbrnr!

Attempted fixed in #36

@cbrnr Please look at the attached file with your normal tools to make sure it is conformant.
test_xdf_datetime.zip
XDFBrowser

@cboulay yes, I can read the file just fine - although it doesn't contain any stream with a regular sampling rate (even the EEG stream is irregularly sampled, not sure if this is intentional).

I'm a little too tired to look at if that EEG stream should have been irregular rate.
In the meantime, @cbrnr Please try the build attached in #38 and let me know if you spot any problems.

I'm a little too tired to look at if that EEG stream should have been irregular rate.

It's OK, it was just an observation which has nothing to do with #36.

File looks fine.