MARC-8 mapping (Eszett, Euro Sign, and some revisions)

Question

MARC-8 mapping (Eszett, Euro Sign, and some revisions)

gugek opened this issue 9 years ago · comments

I came across some records in the wild which had the eszett in them and noted that the existing marc8_mapping.py doesn't have a mapping for that character (UTF-8: U+00DF).

It looks like the LC Code Tables for MARC-8 mappings were updated in 2004: see https://memory.loc.gov/diglib/codetables/45.html which might explain how the character (and the Euro symbol) are overlooked.

I can provide an updated file in a pull request.

But there are a a couple of other changes listed that aren't reflected in the mapping:

See:

Revised June 2004 to add the Eszett (M+C7) and the Euro Sign (M+C8) to the
MARC-8 set.

Revised September 2004 to change the mapping from MARC-8 to Unicode for
the Ligature (M+EB and M+EC) from U+FE20 and U+FE21 to U+0361.

Revised September 2004 to change the mapping from MARC-8 to Unicode for
the Double Tilde (M+FA and M+FB) from U+FE22 and U+FE23 to U+0360.

Revised March 2005 to change the mapping from MARC-8 to Unicode for the
Alif (M+2E) from U+02BE to U+02BC.

So the question is how to handle the revised mappings? Just do the right thing right now? Keep doing the old behavior? Its easy enough with the new characters but the changes might be problematic for some?

Ed Summers · Answer 1 · Fri Dec 18 2015 05:01:20 GMT+0800 (China Standard Time)

Hi @gugek, if you can send a pull request for these changes I will merge them. The best we can do is try to do the right thing now I think.

Ed Summers · Answer 2 · Fri Dec 18 2015 23:36:04 GMT+0800 (China Standard Time)

Fixed in f0faf74