agentcooper / react-pdf-highlighter

Set of React components for PDF annotation

Home Page:https://agentcooper.github.io/react-pdf-highlighter/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gaps in Multiline Highlights

holub008 opened this issue · comments

Bug

The middle rectangles of multiline highlights are not being passed from the onSelectionFinished prop in Firefox. The end result in the demo app looks like:

Screen Shot 2022-03-22 at 4 00 31 PM

Screen Shot 2022-03-22 at 4 01 05 PM

Reproduce

In Firefox, go to the demo app and make a 3+ line selection among the plain text, and click "Add Highlight".

Myself and several others have verified this bug in our target application & the example app, using several different PDFs. I believe this bug first appeared in the 5.4.0 release.

The bug doesn't reside there, but the middle highlight rects are being filtered out in getClientRects, where the isClientRectInsidePageRect check is failing. In the below example, the middle highlights have a right extent of 1144.51, while the page container has a right extent of 1143.25. I added some fudge pixels to generate the attached visual:

const isClientRectInsidePageRect = (clientRect: DOMRect, pageRect: DOMRect, fudge: number=2) => {
  if (clientRect.top < (pageRect.top - fudge)) {
    return false;
  }
  if (clientRect.bottom > (pageRect.bottom + fudge)) {
    return false;
  }
  if (clientRect.right > (pageRect.right + fudge)) {
    return false;
  }
  if (clientRect.left < (pageRect.left - fudge)) {
    return false;
  }

  return true;
};

The highlighted client rects sure look wrong, and they're coming directly from PDF.js's range.getClientRects method. So I'm led to believe this is a platform-dependent behavior (bug) of PDF.js. Ironic that it's coming from a Mozilla project :D

Screen Shot 2022-03-23 at 12 40 32 AM

Although I don't understand its implications, this behavior seems to be driven by textLayerMode: 2 for the PDFViewer, which was introduced in #170. When I switch the mode between 1 & 2, the selection experience is better in 2 because it appears to add some margin to the left & right of each line, creating a larger selection region; I wonder if that added margin is leaking through to our highlight areas (in FF only?).

I don't have the understanding or time to dig into PDF.js internals so would advocate for a fix/hack at our level. IMO, it's preferable to have extra chunky looking highlight regions over gaps. The former is visually unpleasing while the latter appears outright incorrect. Is it worth adding a few fudge pixels to our isClientRectInsidePageRect check to allow these to pass through?

Hello @holub008, I've made the PR #181 to upgrade to the latest pdfjs-dist package and the issue you are presenting seems to be resolved. Feel free to download my branch and test it :)

See screenshots:

Captura de Pantalla 2022-03-23 a la(s) 17 11 52
Captura de Pantalla 2022-03-23 a la(s) 17 15 09

@OscarBC Hey Oscar, I still see the issue in Firefox (98.0.2) on your branch:
image

I am selecting starting from "However," (right under the third default highlight) till "false positives." (end of page).

Unfortunately I can't replicate it.

Grabacion.de.pantalla.2022-03-24.a.la.s.11.40.08.mov

Try deleting "./node-modules" and "package-lock.json" and re-installing the packages in both projects.

I'm working on new fixes. One of them is changing the parent container of the highlights. I face some issues when zooming in on the document that caused the highlights position to shift after the scale goes above 1.0 (similar to the issue you are having on firefox).

Please see if the these changes fix your issues, the branch is: https://github.com/OscarBC/react-pdf-highlighter/tree/fix/zoom-highlghlights-tips-fixes If you would like to test them out.


This is a video of the issue:

scale-error.mov

This is a video after the fix:

scale-fix.mov

@OscarBC Thanks for digging in! It's hard to tell from your screen grabs if you did this in all your testing, but you have to click "Add Highlight" to trigger the gaps. I did a fresh checkout/install on your PR's branch and did not have luck with the "However" -> end of page highlight.

ohh I hadn't clicked on it, hehe. Let me see what I can find.

@holub008 I think setting textLayerMode: 1, in the PdfHighlighter.tsx class, for the PDFViewer constructor options make this issue go away.

I tried debugging it, and it all has to do with the ranges and their transformation from the viewport, but it's hard to debug it. All those calculations which @agentcooper implemented are based on the viewport that come from pdf-js.

Yep, I noticed the same. I'm not sure that's the way to go here, however, as the text selection felt degraded with mode=1 (less margin on the side to start your selection).

Basically PDF.js has a very old bug that when textLayerMode: 1, the text selection selects the entire page when hitting a white space (link to bug). They released a enhanced selection mode, using textLayerMode: 2, but this mode has problem with regards to getting the text selection bounds since youll be getting bounds that are not within the text's area. (link) That's why the highlights are broken in version 5.4.0, because the client rects are returning with wrong boundaries from PDF.js.

commented

While the bug with highlighted empty margins is still present, we could patch it for every document that has at least 0.01px space between selected text and its container with single line of code in the filtering function.

image

Would you be interested in adding that line?

commented

IMO, it's preferable to have extra chunky looking highlight regions over gaps. The former is visually unpleasing while the latter appears outright incorrect.

@holub008 maybe we don't have to make a value judgement here. All that I would see as satisfacing is to expose to the clients ability to provide own recs filtering function.

Then everyone can provide their own implementation fitting own use cases and supported browsers.
Would you be interested in PR with that proposition?