CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unrecognized encoded Chinese text file #142

melinyi opened this issue · comments

commented

Unrecognized encoded Chinese text file #142

I have uploaded the corresponding file

What is the expected encoding?

commented

What is the expected encoding?

Chinese encoding, maybe GB18030

From my side, GB2312 was recognized as EUC-JP with confidence 0.99 if the text is short (10 characters). But correct if it's text is long (>200 characters)

Any chance we're gonna get an update on that one, given the low activity of late?

My library has an open issue depending on it 😅