regebro / tzlocal

A Python module that tries to figure out what your local timezone is

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use ZoneInfo instead of pytz on Python 3.9+

ofek opened this issue · comments

I assume that this will be a breaking change, because pytz zones are used very differently from standard tzinfos. End users will need to stop using .localize.

@regebro If you're interested, I'd like to see some similar functionality to this added to the zoneinfo module in Python 3.10. I was going to write it myself, but I'd be happy to have an expert collaborator 😄.

@regebro If you're interested, I'd like to see some similar functionality to this added to the zoneinfo module in Python 3.10. I was going to write it myself, but I'd be happy to have an expert collaborator smile.

I'm not sure we want this in the standard library since this library has quite a few kludges to get the time zone information out of the (sometimes uncooperative) operating systems.

I am however eager to see a pytz-free version of this library. Maybe somebody should fork it if @regebro does not want to mess with zoneinfo in this one.

@regebro I'm going to need this soon so I'm going to fork this project if you don't respond. I'd like to avoid this if possible.

I have a working zoneinfo version here: https://github.com/agronholm/tzlocal/tree/zoneinfo

How would you like to proceed?

Make a Pull request to this repo and I'll look at it.

It may be worth just making a fork anyway, because the two options are to either support pytz and zoneinfo (with zoneinfo opt-in), or make a pretty significant breaking change and only support zoneinfo. Supporting both and keeping pytz as a hard dependency is undesirable in my mind.

Another option is to use pytz-deprecation-shim, which makes the "breaking change" much less breaking — the only significant breaking change there is in the handling of datetime arithmetic.

The easiest thing with the least breakage across the board would probably be to for tzlocal under a different name (though obviously I'd be sad since that would mean we wouldn't get that sudden influx of new zoneinfo users represented by all the forced migrations 😅).

@pganssle I know this is not the proper forum for this, but I have some serious doubts about the correctness of the zoneinfo datetime arithmetic. With the following script:

from datetime import datetime, timedelta

import pytz

try:
    from zoneinfo import ZoneInfo
except ImportError:
    from backports.zoneinfo import ZoneInfo

tz = ZoneInfo('US/Eastern')
dt1 = datetime(2013, 11, 3, 1, 30, tzinfo=tz, fold=0)
dt2 = dt1 + timedelta(minutes=30)
print(dt2.isoformat())

tz = pytz.timezone('US/Eastern')
dt1 = tz.localize(datetime(2013, 11, 3, 1, 30), is_dst=True)
dt2 = tz.normalize(dt1 + timedelta(minutes=30))
print(dt2.isoformat())

The output is:

2013-11-03T02:00:00-05:00
2013-11-03T01:00:00-05:00

Pytz gets it right, zoneinfo gets it wrong. Wasn't this supposed to work out of the box with zoneinfo?
One more thing worries me:

tz = ZoneInfo('US/Eastern')
dt1 = datetime(2013, 11, 3, 1, 30, tzinfo=tz, fold=0)
dt2 = datetime(2013, 11, 3, 1, 30, tzinfo=tz, fold=1)
assert dt1 != dt2

This script fails. Why? These are two completely different points in time and they should NOT compare as equals!

If you can suggest a better forum for this (bugs.python.org), I'll be happy to continue over there. I'm also available on IRC.

Pytz gets it right, zoneinfo gets it wrong. Wasn't this supposed to work out of the box with zoneinfo?
One more thing worries me:

Incorrect, pytz gets this wrong and zoneinfo gets it right, see my blog post on timezone-aware datetime semantics.

tz = ZoneInfo('US/Eastern')
dt1 = datetime(2013, 11, 3, 1, 30, tzinfo=tz, fold=0)
dt2 = datetime(2013, 11, 3, 1, 30, tzinfo=tz, fold=1)
assert dt1 != dt2

This script fails. Why? These are two completely different points in time and they should NOT compare as equals!

The precursor to my post on datetime arithmetic goes into some detail on this point, but basically the issue is that same zone comparisons are different from different zone comparisons. Same zone comparisons (and arithmetic) assume that you are talking about local time, and two zones are considered equal if the local portion of the date matches (regardless of what the UTC offset says). Inter-zone comparisons need a "common ground" and except for the weird edge case where inter-zone comparisons always return false during folds, they convert to UTC and compare.

If you want comparisons and arithmetic to work like you are working in UTC, you should convert to UTC first.

(Also don't use US/Eastern, it's a deprecated alias for America/New_York).

Also, like I've been saying in many places — in this thread and others on this conversion — this is a major breaking change you are undertaking here. It fundamentally changes the time zone model of time zones returned by tzlocal and the semantics thereof. You can see the extensive migration guide I wrote for the pytz_deprecation_shim library and as I think I mentioned in one of those threads, even that is not able to perfectly mimick pytz's behavior with regards to arithmetic (see this section), which is one of the reasons that it has been harder for me to get django to adopt pytz-deprecation-shim for its process of deprecating the aspects of pytz that are exposed to its end users.

You should definitely not expect zoneinfo to "work out of the box" in the sense that it would be a drop-in replacement for pytz. It "works out of the box" in the sense that it does what datetime's time zone model expects, so you hopefully won't be fighting with a mismatch between the datetime and time zone providers anymore, but it's important to understand that pytz and zoneinfo, while doing similar things, have very different semantics.

I went and read PEP 495 (and your blog posts as well) and this explains some of the oddities I'm seeing, but it left me quite unsatisfied.

My initial expectation was that if I add timedelta(minutes=30), I would get a datetime that 30 minutes in the future from the original, but this is not what is happening when crossing the DST boundary:

tz = ZoneInfo('America/New_York')
dt1 = datetime(2013, 11, 3, 1, 30, tzinfo=tz, fold=0)
dt2 = dt1 + timedelta(minutes=30)
print(dt2 - dt1)
print((dt2.timestamp() - dt1.timestamp()) / 60)

tz = pytz.timezone('America/New_York')
dt1 = tz.localize(datetime(2013, 11, 3, 1, 30), is_dst=True)
dt2 = tz.normalize(dt1 + timedelta(minutes=30))
print(dt2 - dt1)
print((dt2.timestamp() - dt1.timestamp()) / 60)

Output:

0:30:00
90.0
0:30:00
30.0

Here, dt2 - dt1 gives me a timedelta of 30 minutes in both cases but the absolute difference on the second line indicates a 90 minute difference which is not what I asked for.

This is the explanation I picked up from the PEP:

The value of fold will also be ignored whenever a timedelta is added to or subtracted from a datetime instance which may be either aware or naive.

It doesn't explain why. Why not set fold=1 in my example case? Would this introduce a backwards compatibility issue? I don't see how, since existing tzinfo implementations ignore the fold anyway.

As for the equality comparison in a previous comment of mine:

The aware datetime comparison operators will work the same as they do now, with results indirectly affected by the value of fold whenever the utcoffset() value of one of the operands depends on it, with one exception. Whenever one or both of the operands in inter-zone comparison is such that its utcoffset() depends on the value of its fold fold attribute, the result is False.

So I'm guessing that this operation is intra-zone, given that I use the same tzinfo object. Since the utcoffset() value of both operands in the above comparison are affected by the value of fold, the logical conclusion is that the result should be False, but it's not. What am I missing here?

You should definitely not expect zoneinfo to "work out of the box" in the sense that it would be a drop-in replacement for pytz

I never did, and I apologize if I made it seem that way. It was just explicitly mentioned in the backports.zoneinfo README that a normalization step is no longer required – a claim which seems a bit misleading in the light of my findings.

I've fixed my library to do time addition by converting the datetimes to timestamps, adding the desired amount of time and then converting back to datetime, but this seems a bit clunky. I imagine other people doing datetime arithmetic will be hit by this inconsistency sooner or later.

I went and read PEP 495 (and your blog posts as well) and this explains some of the oddities I'm seeing, but it left me quite unsatisfied.

My initial expectation was that if I add timedelta(minutes=30), I would get a datetime that 30 minutes in the future from the original, but this is not what is happening when crossing the DST boundary:

The relevant information is not in PEP 495, because PEP 495 didn't change anything about how datetime arithmetic works. I am not sure that I can explain it any better than my blog post did. You are simply wrong about what timedelta addition does in Python. It is a common confusion, and not one we can remedy at this point. If I could, I would get rid of timedelta addition altogether, as there is no unambiguous definition for what to do when you add a fixed amount of time to a date. Ideally there would be two different methods that very obviously do "wall time addition" and "absolute time addition", but I suspect people would still confuse these because most of the time they do exactly the same thing.

The relevant fact is that adding a timedelta to a datetime is not giving you the answer to "what time will it be in this zone after X amount of time has elapsed" it is giving you the answer to the question, "If I look at the calendar and the clock and I add X amount of days and Y amount of hours, what time will it be?"

If you want one or the other, I suggest writing a helper function that does what you want. absolute_add(dt, td) is very easy to implement and will be completely unambiguous.

I never did, and I apologize if I made it seem that way. It was just explicitly mentioned in the backports.zoneinfo README that a normalization step is no longer required – a claim which seems a bit misleading in the light of my findings.

It's not misleading because normalization has nothing to do with ensuring that addition uses absolute time semantics. I assume that was a deliberate choice, but it's not what normalization is for. Normalization is because pytz proactively attaches a fixed offset to your datetime, and so the offset is wrong on the non-normalized datetime:

>>> import pytz
>>> from datetime import datetime, timedelta
>>> NYC = pytz.timezone("America/New_York")
>>> dt = NYC.localize(datetime(2020, 1, 1))
>>> dt + timedelta(days=180)
datetime.datetime(2020, 6, 29, 0, 0, tzinfo=<DstTzInfo 'America/New_York' EST-1 day, 19:00:00 STD>)

Note that the result is in EST, when it should be in EDT. When you normalize the datetime, it is basically just calling dt.astimezone(pytz.timezone("America/New_York")) on the original datetime (with additional logic for handling what happens if you're in an ambiguous or imaginary time), which happens to shift both the offset and the datetime. With zoneinfo, you get the normal datetime semantics as intended, because you get an addition in civil time and calls to utcoffset(), dst() and tzname() already return the appropriate thing.

So the reason this is unaffected by fold is because it would make no sense for it to be affected by fold, since fold is only relevant for determining what offset rules apply, and offset rules are not used in datetime arithmetic.

So I'm guessing that this operation is intra-zone, given that I use the same tzinfo object. Since the utcoffset() value of both operands in the above comparison are affected by the value of fold, the logical conclusion is that the result should be False, but it's not. What am I missing here?

As I mention above, I think you maybe failed to grok the central theme of both blog posts, that intra-zone comparisons only ever look at the naïve portion of the datetime. You are comparing 2013-11-03T01:30-05:00 to 2013-11-03T01:30-04:00; if you only look at the naïve portion of them, you are comparing 2013-11-03T01:30 to 2013-11-03T01:30, and the answer is True. You could make an argument that fold is part of the naïve portion of the datetime and so should be included in the comparison, but that would cause the unfortunate result that datetime(2020, 1, 1, fold=0, tzinfo=timezone.utc) != datetime(2020, 1, 1, fold=1, tzinfo=timezone.utc). Perhaps that would have been desirable in some situations, but it would prevent you from being able to use fold=1 to set your preferred policy for what to do in the event of an ambiguous or imaginary time even if you don't know that you have one (for example, (datetime(2020, 1, 1, tzinfo=tz) + timedelta(seconds=random.randint(0, 31536000))).replace(fold=1) is a fairly low cost way to say, "add some time, and default to the second instance of it in the event that it's ambiguous or imaginary").

The problem is complicated, there are a lot of situations where you want different semantics and there's not necessarily one good way to satisfy everyone's intuitions about how things should work in all cases.

"This script fails. Why? These are two completely different points in time and they should NOT compare as equals!"

Because it has the same time and day and the same timezone object.

IMO, in the adding example, both zoneinfo and pytz is doing the wrong thing. Pytz says it's 02:00, with DST, a time that does not exist. You have to explicitly normalize it to fix that. zoneinfo has the same incorrect attitude that the arithmetic should have the result people who don't know anything about DST would expect. Ie, you add 30 minutes to 1:30, you get 2:00, even though that actually now means you added 1 hour 30 minutes.

I agree that datetime should have used UTC internally, that would have been much better and made everything much easier, but my ADD (Anger Driven Development) rage only lasted for a week, so my effort to make something better died. :-)

Because it has the same time and day and the same timezone object.

I get the comparison logic, but why wasn't it fixed when the fold attribute was added? Comparing two different points in time as equals with tz aware datetimes makes no sense to me since they have all the necessary information to do a meaningful comparison. But the debate is academic now, and what's done is done.

Pytz says it's 02:00, with DST, a time that does not exist. You have to explicitly normalize it to fix that

As pytz predates the fold attribute, it couldn't do much about this. It might've done the right thing if it had had the means when it was written. Zoneinfo, on the other hand, was designed with that attribute in mind, and it still manages to get it wrong, also requiring normalization to get the correct result.

They probably didn't fix it because they don't think it's broken, as per the "do what is expected if DST doesn't exist" philosophy.

IMO, in the adding example, both zoneinfo and pytz is doing the wrong thing. Pytz says it's 02:00, with DST, a time that does not exist. You have to explicitly normalize it to fix that. zoneinfo has the same incorrect attitude that the arithmetic should have the result people who don't know anything about DST would expect. Ie, you add 30 minutes to 1:30, you get 2:00, even though that actually now means you added 1 hour 30 minutes.

This has nothing to do with zoneinfo or pytz. Neither of them has any way to change this. The core issue is that datetime addition is just a different (though also valid) operation than what @agronholm wants it to be. pytz's behavior without normalization is wrong because pytz time zones are not equipped for use with arithmetic. zoneinfo does the right thing for the operation that was performed.

I understand that many people think that addition should be a different (equally valid) operation, and it seems reasonable that it should do so.

One thing I've been tooling around with is the idea of adding a duration type — basically a timedelta for "absolute time" arithmetic operations. The biggest issue there is that ideally we'd have datetime - datetime return a duration when the difference is inter-zone and a timedelta when the difference is intra-zone (so that dt2 + (dt1 - dt2) == dt1 is always True), which would be something of a breaking change.

As is the usual plight, we have very little insight as to how people are using this stuff in the wild and no way to communicate with our users. There's a good chance that only a small number of people are getting this right in a way that would be affected by a change of this nature anyway, in which case I'd be much more comfortable changing it. I had some thoughts about doing this in dateutil about 4 years ago that never really materialized.

It has become obvious to me lately that this was a mistake. The question is how to fix it.
Please discuss:

#117

@ofek @pganssle