catamphetamine / read-excel-file

Read *.xlsx files in a browser or Node.js. Parse to JSON with a strict schema.

Home Page:https://catamphetamine.gitlab.io/read-excel-file/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

(Chinese) A few illegal characters � show up when the file is large enough

duanyukai opened this issue · comments

I found this strange bug, after parsing my excel file with large amount of Chinese characters, the output file contains very few amount of illegal utf8 characters ( shown as �) .
I can only reproduce this bug when the file is large enough, the testcase file is below. I just copied the same line a lot.

测试.xlsx

image

Hmm, no idea.
I guess this issue should stay open so that other Chinese-speaking users could see it.

I'll try to find some other "smaller" testcases, it seems like fault with buffer or something else?

it seems like fault with buffer or something else?

Absolutely no idea.
Sometimes I think that we should find an alternative simple Excel reading library and place the link in the readme: this library is intended for really simple cases, and people say it won't always work for large files.

Encountered this also in Finnish words, where Näytä was converted into N��ytä. The latter ä is correct but the first one becomes two Unicode replacement characters U+FFFD.

This was triggered by modifying other cell values (the same value was read correctly previously). Adding any text in front of Näytä results in correct conversion, so this seems to require some very specific conditions to manifest.

As for 'large enough', our file is 28kB (185 rows by 5 columns) which I consider to be pretty small.

@plaa Attach the file illustrating the bug so that someone could potentially look at it