CharsetDetector / UTF-unknown

Character set detector build in C# - .NET 5+, .NET Core 2+, .NET standard 1+ & .NET 4+

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Object reference not set to an instance of an object.

MIMAXUZ opened this issue · comments

I have several files and I can read them in own ecnoding format. But there is a problem reading a single file. I read the file by determining which codePage contains the information in the file.

I have the following code:

var enocder = CharsetDetector.DetectFromFile(path);
//int encodeResult = enocder.Detected != null ? enocder.Detected.Encoding.CodePage : 28591;
int encodeResult = enocder.Detected.Encoding.CodePage;

Error:

System.NullReferenceException: 'Object reference not set to an instance of an object.'

UtfUnknown.DetectionResult.Detected.get returned null.

But no other file had such a problem. When I open the file via notepad, Encoding shows ANSI.
The file is not empty, and contains mostly texts in the Cyrillic alphabet. I taught in 1251, UTF-8 format but ???? character is changing.
How can the problem be solved? Thank you!

Hello!

It is possible that the library could not detect what encoding the file has.

When I open the file via notepad, Encoding shows ANSI.

Do you mean nodepad++? This library slightly different algorithm, see #80

NullReferenceException is also thrown if file is empty (file size is 0):

// Detect from File (NET standard 1.3+ or .NET 4+)
DetectionResult result = CharsetDetector.DetectFromFile("path/to/file.txt"); // or pass FileInfo

Maybe it also fails for other methods which accept strings/streams.

Please share a full stracktrace, thanks!

I verified detection on empty file/stream/bytes and it works as expected:

[Test]
public void CharsetDetector_EmptyStreamDetection_DetectedShouldBeNull()
{
    const string emptyFile = "empty.txt";

    File.Create(emptyFile).Dispose();

    Assert.IsNull(CharsetDetector.DetectFromFile(emptyFile).Detected);
    Assert.IsNull(CharsetDetector.DetectFromStream(File.Open(emptyFile, FileMode.Open)).Detected);
    Assert.IsNull(CharsetDetector.DetectFromBytes(Array.Empty<byte>()).Detected);
}

@MIMAXUZ It means that encoding detection failed for your file - in this case charsetDetectorResult.Detected.Encoding is null.

Thanks for the confirm @i2van

Indeed, Detected could be null is the detection failed.

The code in the start:

int encodeResult = enocder.Detected.Encoding.CodePage

Could indeed throw an exception

recommend usage:

int? encodeResult = enocder.Detected?.Encoding.CodePage;