opencivicdata / scrapers-ca

Canadian legislative scrapers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Toronto: Resolve issue with two-letter committee codes being re-used

patcon opened this issue · comments

Ugh. Come on, tmmis...

http://app.toronto.ca/tmmis/decisionBodyProfile.do?function=doPrepare&decisionBodyId=346
http://app.toronto.ca/tmmis/decisionBodyProfile.do?function=doPrepare&decisionBodyId=782

Same term. Same two-letter code: HA. How fitting... :)

Not even sure how they created a system that allowed this without bending over backward internally.

Anyhow, since we resolve on the two-letter code in order to determine pseudonyms between terms (which sometimes change), we'll have to sort out how to deal with edge-cases like this. For now, it's an defunct pair of committees, so it shouldn't affect current data too much

Well, the codes don't overlap in time. They shut down one committee, then reused the code for a new committee - but within the same term. Not sure if that helps, but you can disambiguate between the two based on the year.

Hm. We could disambiguate the agenda items based on year, but the agenda items weren't being considered in the committee scraper, where we de-dup the committee pages and collect alternative names for committees. Anyhow, just means the approach of using committee codes as the unique de-dup'ing key will have to be special-cased.

It's just kind of a pain, as I thought it was a reasonable assumption that they wouldn't be reused :/

Might make sense to move two-letter code into extras now, instead of identifier, since it's not really an properly unique identifier anymore...

Another data point:

General Government Committee: GG
Government Manager Committee: GM

According to the description, this committee was renamed and code reassigned during the same session, but is technically the same organizing body. So the code is officially not to be trusted or structure scrapes around, and we should rework the committee scraper -- perhaps explicitly storing pseudonyms in a dict