edsu / pymarc

process MARC records from Python

Home Page:http://python.org/pypi/pymarc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MARC-8 mapping (Eszett, Euro Sign, and some revisions)

gugek opened this issue · comments

I came across some records in the wild which had the eszett in them and noted that the existing marc8_mapping.py doesn't have a mapping for that character (UTF-8: U+00DF).

It looks like the LC Code Tables for MARC-8 mappings were updated in 2004: see https://memory.loc.gov/diglib/codetables/45.html which might explain how the character (and the Euro symbol) are overlooked.

I can provide an updated file in a pull request.

But there are a a couple of other changes listed that aren't reflected in the mapping:

See:

Revised June 2004 to add the Eszett (M+C7) and the Euro Sign (M+C8) to the
MARC-8 set.

Revised September 2004 to change the mapping from MARC-8 to Unicode for
the Ligature (M+EB and M+EC) from U+FE20 and U+FE21 to U+0361.

Revised September 2004 to change the mapping from MARC-8 to Unicode for
the Double Tilde (M+FA and M+FB) from U+FE22 and U+FE23 to U+0360.

Revised March 2005 to change the mapping from MARC-8 to Unicode for the
Alif (M+2E) from U+02BE to U+02BC.

So the question is how to handle the revised mappings? Just do the right thing right now? Keep doing the old behavior? Its easy enough with the new characters but the changes might be problematic for some?

Hi @gugek, if you can send a pull request for these changes I will merge them. The best we can do is try to do the right thing now I think.

Fixed in f0faf74