dahlia / wikidata

Wikidata client library for Python

Home Page:https://pypi.org/project/Wikidata/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DatavalueError: unsupported calendarmodel for time datavalue

eracle opened this issue · comments

from wikidata.client import Client
from wikidata.client import Entity

client = Client()
entity = client.get("Q220", load=True)
for _, values in entity.iterlists():
  print(values)
---------------------------------------------------------------------------
DatavalueError                            Traceback (most recent call last)
<ipython-input-39-8af3489602dd> in <module>()
     10         yield str(value.label)
     11 
---> 12 list(generate_related_entities_labels("Q220"))

6 frames
/usr/local/lib/python3.7/dist-packages/wikidata/datavalue.py in time(self, client, datavalue)
    166         if cal != 'http://www.wikidata.org/entity/Q1985727':
    167             raise DatavalueError('{!r} is unsupported calendarmodel for time '
--> 168                                  'datavalue'.format(cal), datavalue)
    169         try:
    170             time = value['time']

DatavalueError: 'http://www.wikidata.org/entity/Q1985786' is unsupported calendarmodel for time datavalue: {'type': 'time', 'value': {'time': '-0753-04-21T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0, 'precision': 11, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985786'}}

We currently support only Gregorian calendar, because the datetime module in the Python standard library assumes the current Gregorian calendar. When a date before 1582 is represented as a datetime.date value, it's converted to proleptic Gregorian calendar first.

Therefore, in order to support other calendars like Julian calendar, there could be two approaches:

  1. To stick with datetime.date type, which means that all dates will be represented in proleptic Gregorian calendar.
  2. To represent dates as a distinct type for each calendar (and remain datetime.date for the current Gregorian calendar).
    • On a quick survey, there are no good Python libraries for other alternative/historical calendars than Gregorian calendar that provide their own date types (except for [lundardate] for Chinese calendar), but they only provide functions to convert a non-Gregorian date into proleptic Gregorian calendar.

Each approach has its own pros and cons:

  1. If we stick with datetime.date:
    • we can happily enjoy its feature-richness, for example, finding the date of the next date, and so on,
    • but we lose the actual date numbers of the date in the original calendar.
  2. If we represent dates as a distinct type for each calendar:
    • we don't lose the actual date numbers,
    • but operations on date objects will be not easy. Because implementing a proper date type for these calendars is beyond the scope of this project, dates in alternative/historical calendars would be represented as a string or a tuple of strings, e.g., ('http://www.wikidata.org/entity/Q1985786', '-0753-04-21').
    • However, we can still manually convert them into proleptic Gregorian calendar using other Python libraries.

To me, the second approach would be slightly better, but I wonder opinions from others, especially, the users of this library.

Even if you don't plan to implement a new date/time format yourself, you could hint at what you want by preparing your code for it. For that purpose, you may have to consider what the API might look like. This is my own wishlist, independent from any particular application:

I think it makes little sense to make up proleptic year/month/date tuples that will never be found in sources from that time anyway (reminds me of the joke about a Roman coin that was determined to be a forgery because it was stamped "43 BC"). The modern concepts of "month" and "week" carry no meaning prior to human civilization, although a calendar month roughly approximates a lunar orbital period, and due to the gradual slow-down of the Earth's spin we may even end up having to subtract leap days from years rather than adding them as we approach pre-glaciation epochs.

I was browsing Wikidata items pertaining to geological time periods when I saw an alert about an issue with one of the timestamped properties: It was faulted for being declared a Gregorian date in spite of it referring to a year prior to 1582. But there was no calendar date at all, not even a specific year, only an approximate point in time several hundred million years before present, give or take a million years! I found no way to declare the timestamp "prehistoric", so I left it unchanged, as I think calling it "Julian" would be just as wrong as the "Gregorian" setting.

For known dates within recorded history, which spans a couple of thousand years, some variant of Julian Day numbers could be used as a common frame of reference, converting those dates into whatever calendar is either specifically requested (even the proleptic ones) or an appropriate default for a particular time and place, such as the Julian calendar in Russia prior to 1918, the French revolutionary calendar, or the "Swedish calendar" (which was neither Gregorian nor Julian) in Sweden and Finland between 1700 and 1712.

In order to retain information about the original calendar numbers, you could add a "calendar used" field to the Julian Day representation without actually reproducing also the original format (assuming the mapping between that calendar and JD numbers is undisputed), much like a time zone label can be added to a timestamp to indicate where an event takes place.

The year/month/date tuple representation (or some entirely different fields such as those found in Mayan, Babylonian or Hindu calendars) should be retained anyway for intermediary use when converting between the internal Julian Day format and corresponding written (or spoken) forms, to aid in translation between different languages or syntactic conventions where no change of calendar may be necessary (like, whether it is 2021-05-04, 5/4/2021 or 4/5 -21).

Also for partially unknown or uncertain dates during this same historical period, such as an unspecified Sunday in October of 1582 (found in a source where you can't even be sure what calendar the scribe was referring to), it's probably a good idea to retain some form of generic year/month/week/day-of-week/date tuple representation, where you can limit the uncertainty to a particular field, or a digit (or set of digits) within that field. However, to actually use this representation will require further thought, and I would consider the implementation of approximate dates (except for the trivial cases, such as a year without a month or date) largely experimental.

I would also be wary of trying to identify dates prior to the Julian Day epoch of Jan 1, 4713 BC [Jul.] (which would result in negative JD numbers). For astronomical calculations, such as when a particular solar eclipse happened in prehistory, or any other event that can be estimated with a precision better than a year, I would suggest using year with decimal fractions, as leap days hadn't been invented yet and we can't tell when "January 1st" began (but it wouldn't surprise me if either Hanna-Barbera or Johnny Hart have written a Flintstones or B.C. comic episode on that theme).

Conversely, the same applies to dates more than some 3,000 years into the future, as that's approximately when the Gregorian calendar year will be one day out of sync with the tropical year. This problem is however of limited interest to most application programmers (with astronomical event predictors being the obvious exception).

As I hadn't logged in to GitHub for a couple of months, it wasn't until yesterday that I learned about the existance of the Arctic Code Vault and even had received a badge for contributing a few bytes of JSON data to it. Neat idea, I'd say. As Pope Gregory's decree, written over 400 years ago, explicitely mentioned the year 2000 (MM, using Roman numerals), I don't want our source code to look any less insightful to future information archaeologists. While they may be unlikely to ever find it, let alone use it on some computing device of theirs, anticipating them doing just that may contribute towards making our code more useful to ourselves and our present-day peers.

In conclusion, I would recommend you to be able to handle both a universal date representation (such as JD) not tied to any traditional calendar, and a per-calendar defined numerical tuple format for conversion and approximation purposes. Avoid using strings, as they are probably language- or script-dependant, the format would be arbitrarily chosen and might result in parsing ambiguities; each calendar implementation should bring its own __format__ methods anyway.

FYI, I'm not (yet) a user of your library, but I might just as well check it out now that I have found it (I recently began writing my own Python layer above SPARQL, but realized it will be too much work before it's even usable).