vacanza / python-holidays

Generate and work with holidays in Python

Home Page:https://pypi.org/project/holidays

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Date dictionaries are initially empty

dsoprea opened this issue · comments

The documentation very states that "holidays.<country>()" returns a dictionary in multiple places:

us_holidays = holidays.US() # this is a dict

This is misleading. It's a dictionary-like object.

It's also initially empty, which is unexpected when you just refer to it as a dictionary:

>>> x = holidays.US()

>>> dict(x)
{}

>>> x['2024-07-04']
'Independence Day'

>>> dict(x)
{datetime.date(2024, 1, 1): "New Year's Day", datetime.date(2024, 5, 27): 'Memorial Day', datetime.date(2024, 6, 19): 'Juneteenth National Independence Day', datetime.date(2024, 7, 4): 'Independence Day', datetime.date(2024, 9, 2): 'Labor Day', datetime.date(2024, 11, 11): 'Veterans Day', datetime.date(2024, 11, 28): 'Thanksgiving', datetime.date(2024, 12, 25): 'Christmas Day', datetime.date(2024, 1, 15): 'Martin Luther King Jr. Day', datetime.date(2024, 2, 19): "Washington's Birthday", datetime.date(2024, 10, 14): 'Columbus Day'}

Here, we had to do a membership check first (either via indexing or get()). Population appears to be lazy-loaded and isn't triggered by just printing the dictionary (__dict__).

If you don't want to change/fix this, then at least don't aggressively imply that the data will be there and ready as soon as you instantiate the class. It's confusing and this proper handling is not documented.

Hi @dsoprea
Thanks for bringing this up!

This is misleading. It's a dictionary-like object.

I agree w/ you on this. It's 100% dict-like object.

If you don't want to change/fix this, then at least don't aggressively imply that the data will be there and ready as soon as you instantiate the class. It's confusing and this proper handling is not documented.

However this one is something I have a different angle on. Could you specify what made you think that the dictionary would be populated w/ holidays? Originally holidays package was designed w/ performance in mind. Having said that I don't think that behavior is going to be changed/fixed.

I'm mostly interested on documentation improvement related to this part:

at least don't aggressively imply that the data will be there and ready as soon as you instantiate the class.

Thanks for your help!

A dictionary is thought of as a static type (a static one not an immutable one, to be clear). By nature of returning a dictionary, you assume that there are no other steps to be taken for it to have data. Especially when the dataset is already specific and thought of as being fairly static. If you tell me that you're going to essentially give me a "dictionary of holidays" but what you give me will always be an empty dictionary of holidays due to undocumented behavior then I'd consider that a broken design.

I don't think we're thinking of performance the right way. You've already overloaded the concept of dictionaries and are doing some lazy-loading. If, however, the developer is calling keys() or values() (or copying/materializing via dict()), they are clearly signaling the library to lazy-load, and the library currently does nothing. Currently, they need to hardcode a lookup of a specific holiday, even though they may actually want all of them, to get things moving because the library doesn't actually support it right after initialization. That's also broken, to me. There is an innate understanding that if I ask for all of the holidays then I am willing to wait for it.

I thought we both agreed it's not a dict but a dict-like object. It seem that your whole case is built upon this misleading comment that needs to be fixed.

Also I beg to disagree on your "undocumented behavior" comment.

I believe your ideas would be helpful to consider while implementing holidays v1. Maybe even v0 could use some sort of eager population flag (w/o changing the default behavior).

As you can see this project could use external contributor's help. Please let us know if you're willing to help with that.

It's also initially empty, which is unexpected when you just refer to it as a dictionary:

>>> x = holidays.US()

>>> dict(x)
{}

You can just initialize holidays object with required years interval when creating it:
x = holidays.US(years=2024)
or
x = holidays.US(years=range(2000, 2025))
and it will contain required values.

@KJhellico Thanks. That's really useful.