planetarypy / pvl

Python implementation of PVL (Parameter Value Language)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Returned Date/Time objects must be "aware"

rbeyer opened this issue · comments

The current (1.0.0-alpha2) implementation can return datetime objects that are either "naive" or "aware" (to use the terminology from the Python datetime library).

However, the PVL Specification clearly indicates in section 2.3.2.1.3 (Date/Time Value) that:

The Date/Time Value is a strict subset of the CCSDS ASCII Time Code recommendation (Reference [3]), in which all time is represented in Universal Coordinated Time, (i.e. Greenwich Mean Time).

This means that any time the pvl library returns a datetime object, it should be an "aware" datetime object.

This concern was brought up as an example of some old code that was failing because the returned datetime objects were "naive" whereas they used to be "aware." So I went to the spec, noticed that wording above, and assumed that I had mis-implemented.

However, the old behavior didn't do this. The old behavior was that if a time ended in a “Z” (e.g. 1990-158T15:24:12Z) or had an hour offset (e.g. 2001-001T01:10:39+7) then the resulting datetime or time would be “aware”, otherwise it would be “naive." In my new implementation, I didn't faithfully reproduce that behavior. Times that ended in an hour offset were made "aware," but those that ended in a Z were returned "naive."

So now we have a dilemma: should we rigorously enforce that all datetime.datetime and datetime.time objects that pvl returns be "aware" as the PVL spec indicates that they should be (which may result in different kinds of breakage for older code), or should we just faithfully replicate the old behavior to return "aware" time objects when they are written with qualifiers that indicate that (a "Z" or a plus value), and "naive" objects otherwise?

The current (1.0.0-alpha2) implementation can return datetime objects that are either "naive" or "aware" (to use the terminology from the Python datetime library).

However, the PVL Specification clearly indicates in section 2.3.2.1.3 (Date/Time Value) that:

The Date/Time Value is a strict subset of the CCSDS ASCII Time Code recommendation (Reference [3]), in which all time is represented in Universal Coordinated Time, (i.e. Greenwich Mean Time).

This means that any time the pvl library returns a datetime object, it should be an "aware" datetime object.

I'm not sure I follow your interpretation here?
Why does a datetime object need to be aware, if the PVL spec demands that all times are UTC anyway? At least that's how I read that quote?

The PVL spec says that all times found in PVL-text are UTC times (ODL modifies this to allow for different timezones via the +hh notation). If the PVL spec says that all times are UTC times, then the appropriate Python object should also be a "UTC time" and that means a timezone "aware" datetime, which conveys that assumption of UTCness.

The old approach was that if a timezone was not explicitly specified (via "Z" or "+12") then return a "naive" object. Maybe the assumption here was that since a timezone was not specified, then return a "naive" object that conveys that unknown. However, the old library did not rigorously adhere to the spec, and I think this is one of those cases. It is not that a datetime without a specifier had an unknown timezone, it is that if it is unspecified, then it is UTC.

If you, as a programmer got a "naive" datetime object, there is not enough information in that object to "unambiguously locate itself relative to other date/time objects." So what happens when you want to do some time math? What if you have some other datetime object that is "aware" and you have a "naive" object that came from pvl, and you try and subtract them or compare them?

>>> import datetime
>>> naive = datetime.time(1, 2)
>>> print(naive)
01:02:00
>>> aware = datetime.time(1, 2, tzinfo=datetime.timezone.utc)
>>> print(aware)
01:02:00+00:00
>>> aware == naive
False

>>> aware > naive
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare offset-naive and offset-aware times
>>> naive - aware
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

So math operations seem to TypeError, as do comparisons, but equality doesn't, it just returns bool, which could be misleading. Especially if you thought that your naive time from pvl should be equal to the UTC aware time that you got from somewhere else (or even from another part of the label that wrote a time with a "Z").

I'm honestly not sure what the right thing to do here is. If I were implementing this from scratch, I think the PVL spec is saying to always return timezone "aware" objects.

However, what might be the impact? Developers would have had to be testing (I hope) the datetime objects that came out of pvl because sometimes they were "aware" and sometimes they were "naive" and if that mattered to an application, they'd have been converting those "naive" objects to "aware" ones (if we change this behavior, those old if statements just wouldn't get used anymore, but wouldn't break things, maybe?). What about people that didn't care or notice if a datetime was "aware" or "naive"? In that case, they probably weren't doing anything special, so maybe this would change wouldn't break things? "Aware" datetime objects just have a parameter that isn't None, but if they previously had time math that worked with "naive" objects (because that's all their labels ever gave them, maybe), then this change could break their code.

I'm honestly not sure what the right thing to do here is. If I were implementing this from scratch, I think the PVL spec is saying to always return timezone "aware" objects.

I still don't see how you read that off of the PVL spec. I'm reading "IS REPRESENTED IN", which I take as: "all times you'll ever find in PVL objects are UTC times". I don't see anything about self-aware datetime objects?

So, if my reading is correct, we still would have to decide though what we want to do.
I tend to just declare my reading as state for the package (sorry! ;) ), so, to not break previous code, clearly declare in the README, that all time-based objects are naive, and incoming time calculations should transform to naive beforehand.

I honestly don't think that there are so many people out there that do time calculations with aware datetimes anyway. If this activity grows in the future, we could always release an API breaking major version upgrade if we feel that it becomes state-of-the-art to do that kind of calculations. I don't feel it's there yet, but I'm happy to be corrected or pursuaded otherwise. (note, that I actually like all kinds of self-intelligent objects myself, I just don't think many others do).

I still don't see how you read that off of the PVL spec.

Oh, no, I fear I may not be communicating well.

I'm reading "IS REPRESENTED IN", which I take as: "all times you'll ever find in PVL objects are UTC times".

Yes, agreed ... for the strict PVL spec. However, the ODL spec (and hence PDS3) widen this to allow you to specify timezones. That's maybe what I didn't make clear.

I don't see anything about self-aware datetime objects?

Clearly the PVL spec doesn't say that, it doesn't have an opinion on how to express these values in any particular programming language. However, if we look at Python and we want to encode a time that is in UTC, then the way to do that is via a timezone "aware" datetime object that is "aware" of the UTC timezone. A timezone "naive" datetime object does not know what timezone it is in. You might assume that a timezone "naive" datetime object means that it is UTC, but some other programmer might assume that a timezone "naive" datetime object means that it is in 'local' time, whatever that means to the programmer or program.

Furthermore, while the PVL spec only allowed UTC times, the ODL spec (and hence PDS3) allowed for the specification of other timezones via a "+hh" decorator at the end of the time string. When pvl comes across these, it correctly creates timezone "aware" datetime objects.

So, if my reading is correct, we still would have to decide though what we want to do.
I tend to just declare my reading as state for the package (sorry! ;) ), so, to not break previous code, clearly declare in the README, that all time-based objects are naive, and incoming time calculations should transform to naive beforehand.

That's just it, to provide an exact match to the way pvl behaved before, sometimes it would provide timezone "aware" objects, and sometimes it would provide timezone "naive" objects, it kind of depended on how the time string was written in the PVL text.

So if you had this PVL text:

start_time = 01:02+00
stop_time = 01:03

and parsed it through the old pvl 0.x versions (or the new 1.0.0-alpha), and tried to subtract stop_time from start_time to get a duration, you'd get a TypeError, because one of them is parsed as a timezone "aware" object and the other is not.

In truth, the PVL spec tells us that both should be interpreted as UTC times (even though only one has a "Z" or a "+00" to specify that), and my Python-theory argument is that therefore, they should both be returned as timezone "aware" objects.

I honestly don't think that there are so many people out there that do time calculations with aware datetimes anyway.

I have no idea. What if one instrument's PVL text reported times without the decorator, and pvl returned them as timezone "naive" objects, and another instrument's PVL text reported times with a decorator, and pvl returned them as timezone "aware" objects. You'd have a devil of a time trying to figure out which observations overlap in time if you couldn't compare them, so you'd have to convert one set to the other's type.

If this activity grows in the future, we could always release an API breaking major version upgrade if we feel that it becomes state-of-the-art to do that kind of calculations. I don't feel it's there yet, but I'm happy to be corrected or pursuaded otherwise. (note, that I actually like all kinds of self-intelligent objects myself, I just don't think many others do).

Sure. The easy path is to just patch the alpha versions to behave like the 0.x version, and sometimes return "naive" and sometimes return "aware" objects, maybe that's the path of least resistance. Alternatively, since we're here at this 1.0.0 cusp, maybe the thing to do is guarantee that pvl will always return timezone "aware" objects (because all times in PVL have a specific timezone, UTC or otherwise)?

I don't think that would actually break much, because people that needed to do time math, were probably checking for "naive" objects, and converting them to "aware" ones so they could do the right math. I know that in one case, some USGS code was relying on the timezone "aware" object that pvl returned (and when I screwed it up, that broke their tests).

Oh wow, thanks for your explanation. That changes things. I need to digest this. Specifically the apparent logical clash of PVL, ODL and PDS and why the hell would PDS, a PLANETARY data system even bother with puny Earth time zones. This kinda breaks me inside a little bit. ;)

I would be in favor of always returning aware datetimes. For the vast majority users this will be a a very minor change and the benefit of always ensuring math with time PVL values works as expected is decent.

I just don't know how many users are out there, and how much of a change this would be. Is there a scenario where it could be that an incoming aware datetime would NOT break their pipelines where they assume (maybe simply by ignorance) that a naive UTC object comes in?

I just don't know how many users are out there, and how much of a change this would be. Is there a scenario where it could be that an incoming aware datetime would NOT break their pipelines where they assume (maybe simply by ignorance) that a naive UTC object comes in?

When you say "naive UTC object" I think that's not the right understanding, a timezone "naive" object, from the point of view of the Python language has nothing to do with UTC. You might assume that a timezone "naive" datetime is in UTC, but the language does not make that assumption, nor that guarantee. This is like the difference between zero and None. Folks who don't understand the difference might assume that zero and None are "the same," but they aren't. A timezone "naive" object is like None (its tzinfo parameter is actually None), while a timezone "aware" object is 'numeric' (its tzinfo has a numeric value).

Anyway, to answer your question: if someone was using a datetime object returned from pvl that happened to be "naive" (remember pvl can return both, depending on the PVL text that was parsed), and they were doing time math (which I am using to describe operations and comparisons), and happened to create a datetime object some other way, then they would have had to create a "naive" object. If we switch it so that all datetimes are "aware" then their time math that is built like this would start throwing TypeErrors.

However, if they got one datetime from pvl, and then another datetime from pvl, and were doing time math, odds are good that both objects came from PVL text that was written in the same way, and so these objects are already either both "naive" or both "aware" and if we change things to always return "aware" objects, then their code won't even notice the change.

I think I'm with @jessemapel and @AndrewAnnex here: let's always return timezone "aware" objects. The PVL spec actually implies this, and while it may seem superfluous, it is the fully-qualified way to represent real, absolute times in Python. Yes, it might break some code, but I think the impact will be low, and if folks are doing spacey stuff with time values, they should probably be using "aware" time objects for completeness anyway.

When you say "naive UTC object" I think that's not the right understanding, a timezone "naive" object, from the point of view of the Python language has nothing to do with UTC.

You assume bad understanding when bad explanation suffices. I meant to say a naive datetime object that carries the UTC time. Because PVL ONLY wants to carry times that ARE in UTC, it is correct to assume that all times de-facto are UTC, at least that's how I read it.

PVL was not a pds endorsed system to begin with anyways, but by making this change we can be more assured that going forward code developed against it is that much better.

Well, technically speaking, PVL just isn't what PDS implemented (they implemented ODL), so there's no technical requirement for the pvl library to improve there. There would be, if the library would address ODL.

but the language does not make that assumption, nor that guarantee.

I'm confused, above quote does exactly say that, doesn't it?

A timezone "naive" object is like None (its tzinfo parameter is actually None), while a timezone "aware" object is 'numeric' (its tzinfo has a numeric value).

Exactly, and taking it together with PVL spec PVL 2.3.2.1.3 from above, the assumption that any given time object coming from a PVL container is in UTC, should be correct, shouldn't it?

let's always return timezone "aware" objects. The PVL spec actually implies this,

I'm not against this at all, but I still disagree that you can read that from the spec.
In my view the spec only talks about the "CONTENT" of the datetime object, not if it should carry the information that it actually indeed IS UTC.

You assume bad understanding when bad explanation suffices.

That's fair.

but the language does not make that assumption, nor that guarantee.

I'm confused, above quote does exactly say that, doesn't it?

Sorry, me eliding things again: I should have said that "the Python language does not make that assumption (that timezone "naive" objects represent UTC times), nor that guarantee."

Exactly, and taking it together with PVL spec PVL 2.3.2.1.3 from above, the assumption that any given time object coming from a PVL container is in UTC, should be correct, shouldn't it?

That assumes that human beings that are writing software that uses these objects are also making that same assumption, they may not be. Or may be using libraries or other code that doesn't.

I'm not against this at all

Oh good, because unless anyone is, then I'll work on a PR to always return datetime.time and datetime.datetime objects as timezone "aware" objects in the returned dict-like from the loaders, because that seems like the consensus here.

I will ask though: when you return a date time “aware” object that is not in UTC, isn’t that formally breaking PVL spec? Or is it not because it always CAN be UTC? While I’m always happy to work with an aware object I’m worried about the implicit promise a library called “pvl” does with respect to the pvl SPEC. Was the return of a datetime object already formally breaking the SPEC then?

Let me say that I really enjoy talking about this stuff, thinking about and understanding how the choices we make as developers impact users is a great way to result in solid code.

I will ask though: when you return a date time “aware” object that is not in UTC, isn’t that formally breaking PVL spec? Or is it not because it always CAN be UTC? While I’m always happy to work with an aware object I’m worried about the implicit promise a library called “pvl” does with respect to the pvl SPEC. Was the return of a datetime object already formally breaking the SPEC then?

That's a complicated question. Since the 0.x architecture did not rigorously distinguish between the various dialects of PVL, there is room to debate. However, our 1.0.0-alpha.x architecture does, and this is how I see it working:

If you were using the PVLParser & Decoder (which is the strict implementation of the PVL spec), then we would guarantee that it only returns timezone "aware" datetime and time objects with a tzinfo parameter that indicates UTC. The PVLParser will error if you try and give it PVL text with a "+hh" timezone offset decorator.

If you were using the ODLParser & Decoder, then we would again guarantee that it only returns timezone "aware" objects, but they could have any arbitrary offset from UTC, because ODL (and PDS3) allow that.

If you don't specify a parser and decoder (the typical mode which uses the OmniParser & Decoder which does its best to decode even non-spec PVL text), then it will again always return timezone "aware" objects that may have any offset from UTC.

Let me say that I really enjoy talking about this stuff, thinking about and understanding how the choices we make as developers impact users is a great way to result in solid code.

I'm glad. I already felt like an obnoxious say-it-all with all my critical questions.. note I'm not saying I'm not one of those, I just need to feel like one all the time. ;)

BTW, I will never ever complain again if my computer gets times wrong... ;) https://youtu.be/-5wpm-gesOY

I have scratched the surface of that rabbit hole of timezones, and I'm honestly doubting why anybody ever wants to deal with that mess... :/
I see that you have removed the pytz requirement, but I really wonder if/how you can sanely recreate the same functionality with datetime, as I don't see anyone even trying to do the same?

Thanks for reminding me of all this different ingenious Decoder/Parser designs, they are awesome though.

I am concluding for now (for myself), that doing timezones (including DST) correctly is nearly impossible, unless you really have too much time on your hands, so I'm, silently, questioning why pvl ever would want to deal with it. So let me pose this final question I have and then I will shut up forever about timezones: Is returning even a datetime object from the strict PVLParser/Decoder within SPEC, considering that PVL and CCSDS going through hell and back in their SPEC to define what kind of string constructs are actually allowed? In other words, wouldn't the strict PVLParser have to return best an ISO UTC string and that's it, and leave it to the user what to do with it?

I'm starting to feel sorry I ever asked anything about this... :/

I see that you have removed the pytz requirement, but I really wonder if/how you can sanely recreate the same functionality with datetime, as I don't see anyone even trying to do the same?

I don't know what to tell you, I've already built the 1.0.0-alpha.x architecture that handles timezone "aware" objects (just not consistently, as the original post indicates, but a PR will fix that soon) without needing a 3rd party library. Probably a topic for us to chat about sometime, if you're really interested, but the short is:

The Python Standard Library datetime module contains complete functionality to represent timezone "aware" time objects. Therefore, for the UTC-only PVL spec, that's all that's needed. For the ODL/PDS3 spec that allows offsets from UTC, there is a very narrow manner in which those offsets can be expressed in PVL text (which are ably handled by a few regexes), so again, no need for anything fancy. The various specs require nothing more, it is possible that pytz was used as a dependency prior to the Python 3 Standard Library having its current functionality.

However, I figured some genius might try and write ISO 8601 time strings into some PVL text since that is a standard (just not one that any PVL spec adheres to), so we also have an optional 3rd party dependency on the dateutil library. If you don't have that on your system, then pvl will just make an ISO 8601 time a str, but if you do have it, it'll return it as a properly initialized timezone "aware" object.

Thanks for reminding me of all this different ingenious Decoder/Parser designs, they are awesome though.

Yeah, I'm a little worried that they're brittle and that I should have just gone with some 3rd party lexer/parser instead of trying to roll my own, but I learned a ton about Python.

I am concluding for now (for myself), that doing timezones (including DST) correctly is nearly impossible, unless you really have too much time on your hands, so I'm, silently, questioning why pvl ever would want to deal with it.

So the advantage is that pvl doesn't deal with the complexity, really. We just want to guarantee that the objects we are returning are the proper kinds of objects (and in our case, since all of the times really are timezone "aware"--even if they just have UTC offsets--that's what we should return).

So let me pose this final question I have and then I will shut up forever about timezones: Is returning even a datetime object from the strict PVLParser/Decoder within SPEC, considering that PVL and CCSDS going through hell and back in their SPEC to define what kind of string constructs are actually allowed? In other words, wouldn't the strict PVLParser have to return best an ISO UTC string and that's it, and leave it to the user what to do with it?

You could say the same thing about PVL Sets and Sequences or PVL numeric quantities or PVL Values that have PVL Units. From that point of view, we should just return strings for every parameter, and not try to convert things to "the best" Python object that a PVL Value could be decoded into (Python set, list, int, or float objects, and our new quantity object capabilities), but that doesn't seem to be the guiding principle here (and it would be a sucky library).

Since the PVL spec and the ODL/PDS3 spec detail PVL Date/Time values, then it is incumbent upon the library to return a Python object that best characterizes that value. In this case, that's a Python dateutil object, which is fully qualified with a tzinfo parameter.

You could say the same thing about PVL Sets and Sequences or PVL numeric quantities or PVL Values that have PVL Units. From that point of view, we should just return strings for every parameter, and not try to convert things to "the best" Python object that a PVL Value could be decoded into (Python set, list, int, or float objects, and our new quantity object capabilities), but that doesn't seem to be the guiding principle here (and it would be a sucky library).

Agreed. Because the parsing actor is a Python library, it's adequate to assume that the further processing of it happens in Python as well. I was wrongly assuming that there might be a need for a "purist" parsing library, so that its results could be handed on to other libraries, but if that would be the case, most likely "these other things" already have their own PVL parser/decoder anyway. spiceypy kinda makes the same kind of decisions in returning more Python-useable objects instead of naked pure objects like CSPICE or FORTRAN arrays.

Ok, this discussion made me understand better what kind of things we are developing against and for. Thanks much, Ross!