tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)

Home Page:https://tesseract-ocr.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Potential Null Pointer Dereference in Function `RecodeBeamSearch::ContinueContext`

hribz opened this issue · comments

Current Behavior

In the Function RecodeBeamSearch::ContinueContext, If the condition on line 906 is false, then the previous=previous->prev statement at the end of each iteration of the for loop will lead to a null pointer dereference.

for (int p = length - 1; p >= 0; --p, previous = previous->prev) {
while (previous != nullptr &&
(previous->duplicate || previous->code == null_char_)) {
previous = previous->prev;
}
if (previous != nullptr) {
prefix.Set(p, previous->code);
full_code.Set(p, previous->code);
}
}

Suggested Fix

If previous could be nullptr, an error handling branch should be added, as shown below:

if (previous != nullptr) {
    ...
} else {
    // Add error handling code here
}

If previous cannot be nullptr, maybe can remove the check for previous, as shown below:

while (previous->duplicate || previous->code == null_char_) {
  previous = previous->prev;
}
prefix.Set(p, previous->code);
full_code.Set(p, previous->code);

I never had a NULL pointer deference in this function and never saw a bug report which reported one. Therefore I think the checks should be removed.

Do you want to send a pull request?

I never had a NULL pointer deference in this function and never saw a bug report which reported one. Therefore I think the checks should be removed.

Do you want to send a pull request?

Yeah, I have send a pull request.

From the name previous it can be nullptr.

From the name previous it can be nullptr.

But if the for loop is entered and previous is nullptr, it will inevitably cause a null pointer dereference. Currently, it seems there is no path where previous would be nullptr when entering the for loop. Or do you think an error handling branch should be added?

From the name previous it can be nullptr.

Yes, but obviously the loop always terminates before the nullptr is reached. Otherwise we'd have lots of Tesseract crashes.

From
for (int p = length - 1; p >= 0; --p, previous = previous->prev) {
to
for (int p = length - 1; p >= 0 && previous ; --p, previous = previous->prev) {

and remove checks from inside the loop.

@egorpugin, would you prefer the nullptr check in the for statement although that case never occurred up to now?

I just did a test with make check and found that the body of the for loop is never executed because length is always 0.

@egorpugin, would you prefer the nullptr check in the for statement although that case never occurred up to now?

Yes.

And more than that is a question about this issue at all.
'Potential' dereference - potential not an issue?

Just do a quick refactor of cond inside for loop and that's enough.
You even discovered that the loop is not executed at all, so don't touch it or change semantics without knowing what it does or what it is for.

So, checking it for nullptr in the for statement LGTM.

I updated the PR. Please review.