chronotope / chrono

Date and time library for Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug in parse_from_rfc2822 with -0000 timezone

nicklan opened this issue · comments

The following program:

extern crate chrono;

use chrono::datetime::DateTime;

fn main() {
    let minus = "Fri, 1 Feb 2013 13:51:20 -0000";
    let plus  = "Fri, 1 Feb 2013 13:51:20 +0000";
    println!("{} -> {:?}",minus,DateTime::parse_from_rfc2822(minus));
    println!("{} -> {:?}",plus,DateTime::parse_from_rfc2822(plus));
}

outputs:

Fri, 1 Feb 2013 13:51:20 -0000 -> Err(ParseError(NotEnough))
Fri, 1 Feb 2013 13:51:20 +0000 -> Ok(2013-02-01T13:51:20+00:00)

This seems wrong. There should be no difference between -0000 and +0000 right?

My understanding is that -0000 indicates the absence of useful time zone information. I'm less sure about what it actually means (especially in practice), however---is it a local time, a UTC without no local time offset (possible with a different interpretation of RFC 2822), or a completely ambiguous timestamp? Chrono currently takes the third option, i.e. they are not safe to read as any time zone, but I would like to change the default if other options are more widespread. (For example, Python email package seems to use the local time option, but only as a last resort.)

Right. The spec is a little hard to read on this. I did notice that python's email.utils parsedate_tz treats +0000 the same as -0000, and gnu date seems to as well.

I guess it's fine either way, but certainly some email clients seem to put -0000 in the date field, meaning if you're parsing those you need to do extra work to transform them to +0000 (or something else)

Any update on this ?

Quite one year elapsed, and no advance on this subject ? Is it possible to help ?

As I read RFC 5322 (not that the text changed from RFC2822), -0000 still indicates that the timestamp is to be semantically interpreted as UTC. Local time is discussed as something that clients should express, but the offset described is the offset of the time-of-day, not the offset of local time from UTC. When it's again discussed at the end of the paragraph, it mentions that -0000 "also" indicates UTC but also indicates that the system's time zone may not be (i.e., is not necessarily) in UTC, and then clarifies more succinctly that the date-time contains no information about the local time zone.

In other words, +0000 means that you should return a DateTimeFixedOffset::east(0), while -0000 means that you really want to return a DateTime instead. In the current implementation, there's no way to disambiguate these two interpretations, but if you allowed DateTime<Option>, it could be done.

For all practical reasons I would like to have -0000 parse same as +0000, if there are no arguments against it - I will provide a PR.

@lifthrasiir Please let us know what we can do here, it's not only python that does it, time 0.1 also generates rfc822 time with -0000: https://docs.diesel.rs/time/struct.Tm.html#method.rfc822z.

I agree that the following sentences from RFC 5322 all say that -0000 should be interpreted the same as +0000, but may mean that the client system wasn't necessarily also set to GMT, which, like, who cares?

The form "+0000" SHOULD be used to indicate a time zone at Universal Time. Though "-0000" also indicates Universal Time, it is used to indicate that the time was generated on a system that may be in a local time zone other than Universal Time and that the date-time contains no information about the local time zone.

and this in particular seems to suggest that -0000 is meant to be used as a
placeholder for "Use UTC but mostly because somebody messed up":

The 1 character military time zones were defined in a non-standard way in [RFC0822] and are therefore unpredictable in their meaning. The original definitions of the military zones "A" through "I" are equivalent to "+0100" through "+0900", respectively; "K", "L", and "M" are equivalent to "+1000", "+1100", and "+1200", respectively; "N" through "Y" are equivalent to "-0100" through "-1200". respectively; and "Z" is equivalent to "+0000". However, because of the error in [RFC0822], they SHOULD all be considered equivalent to "-0000" unless there is out-of-band information confirming their meaning.

And:

Other multi-character (usually between 3 and 5) alphabetic time zones have been used in Internet messages. Any such time zone whose meaning is not known SHOULD be considered equivalent to "-0000" unless there is out-of-band information confirming their meaning.

And from the apendix:

The following are the changes made from [RFC0822] and [RFC1123] to [RFC2822] that remain in this document:
1...snip....
6. Specifically allow and give meaning to "-0000" time zone.

which is all to say I am happy to take a PR that interprets -0000 as UTC.