CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add Detectors and Probers for target languages

rstm-sf opened this issue · comments

Hello!

It may be worth adding the ability to determine the encoding if you know which target language?

Hi,

Sorry for the late response.

What do you mean with this?

Hello!

I created a pr #63 for ease of understanding.

In order to detect the encoding prober's objects are created. They are defined for multiple languages. With a small sample of characters to detect the encoding, conflicts may arise between the encodings due to the possibility of being a character code in different languages.

But, what if we need to define an encoding, knowing that it can belong to only one language? Then you can restrict yourself to probers only for a given language, reducing the likelihood of incorrect detections.

PS. Sorry for my english.

sound good, but now sure how easy it is to change that is this code base.

It seems to me that first we need to try to single out single-byte probers by language, as models

Hello, @304NotModified !

We can make breaking changes and override, using internal, everything that is in src/Core? This would make it easier to change the code.

do you mean if making breaking changes in src/core is OK? I think it is. We should make them internal also

I think it would be nice if we could just change the source in src/core without thinking about breaking changes. That is, change the modifier from public to internal.

I just have the idea of separating probers as models into languages (however, it will take a lot of time, there are about 100 of them). And it would be nice then to change the namespace