Improper encoding / decoding of some special 7-bit values in cp437, macintosh

Question

Improper encoding / decoding of some special 7-bit values in cp437, macintosh

rossj opened this issue 4 years ago · comments

Hi there.

I've noticed that cp437 does not properly encode / decode special symbols that are assigned to bytes 0x01-0x1F and 0x7F. Instead, When decoding, these bytes are incorrectly treated as-is and passed through as control characters. Similarly, when encoding the special characters in this range, they are replaced with question marks.

I've noticed a similar issue with the macintosh encoding, which has special symbols defined at x11-x14.

As an example, the two tests below are currently failing:

import { decode, encode } from 'iconv-lite';

describe('encodings', () => {
    it('should encode special cp437 symbols that map to bytes 0x0-0x1F', () => {
        const input = '\u263A'; // A smiley face
        const result = encode(input, 'cp437');
        expect(result[0]).toEqual(1);
    });

    it('should decode cp437 bytes in range 0x01-0x1F', () => {
        const input = Buffer.from([1]);
        const result = decode(input, 'cp437');
        expect(result).toEqual('\u263A');
    });
});

Alexander Shtuchkin · Answer 1 · Sun Nov 22 2020 12:16:03 GMT+0800 (China Standard Time)

hmm yeah I think you're right. Thank you for filing this issue and the tests, really helpful!
My current encoding generation code uses iconv project as the source, so it seems that it's wrong there too. Strange to see this in a relatively widely known encoding.
I'll fix this soon.

Bryan Ashby · Answer 2 · Sat Jan 30 2021 09:56:08 GMT+0800 (China Standard Time)

Came here to log exactly this. Any ETA? This would help a lot with enigma-bbs as well as a text mode RPG I'm working on!

yosion-p · Answer 3 · Mon Aug 23 2021 10:17:13 GMT+0800 (China Standard Time)

I had a double check, seems the issue exist indeed. I checked the source code, and found cp437 was achieved by remote resource, but i guess the remote resource lack of partial data. how about we make special treatment for these special characters?

hmm yeah I think you're right. Thank you for filing this issue and the tests, really helpful!
My current encoding generation code uses iconv project as the source, so it seems that it's wrong there too. Strange to see this in a relatively widely known encoding.
I'll fix this soon.