ashtuchkin / iconv-lite

Convert character encodings in pure javascript.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improper encoding / decoding of some special 7-bit values in cp437, macintosh

rossj opened this issue · comments

Hi there.

I've noticed that cp437 does not properly encode / decode special symbols that are assigned to bytes 0x01-0x1F and 0x7F. Instead, When decoding, these bytes are incorrectly treated as-is and passed through as control characters. Similarly, when encoding the special characters in this range, they are replaced with question marks.

I've noticed a similar issue with the macintosh encoding, which has special symbols defined at x11-x14.

As an example, the two tests below are currently failing:

import { decode, encode } from 'iconv-lite';

describe('encodings', () => {
    it('should encode special cp437 symbols that map to bytes 0x0-0x1F', () => {
        const input = '\u263A'; // A smiley face
        const result = encode(input, 'cp437');
        expect(result[0]).toEqual(1);
    });

    it('should decode cp437 bytes in range 0x01-0x1F', () => {
        const input = Buffer.from([1]);
        const result = decode(input, 'cp437');
        expect(result).toEqual('\u263A');
    });
});

hmm yeah I think you're right. Thank you for filing this issue and the tests, really helpful!
My current encoding generation code uses iconv project as the source, so it seems that it's wrong there too. Strange to see this in a relatively widely known encoding.
I'll fix this soon.

Came here to log exactly this. Any ETA? This would help a lot with enigma-bbs as well as a text mode RPG I'm working on!

I had a double check, seems the issue exist indeed. I checked the source code, and found cp437 was achieved by remote resource, but i guess the remote resource lack of partial data. how about we make special treatment for these special characters?

hmm yeah I think you're right. Thank you for filing this issue and the tests, really helpful!
My current encoding generation code uses iconv project as the source, so it seems that it's wrong there too. Strange to see this in a relatively widely known encoding.
I'll fix this soon.