mozilla / readability

A standalone version of the readability lib

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Discrepancy between firefox reader mode to readability library.

eliyabar opened this issue · comments

Hi all,
I have an unresolved problem. I'm figuring out why, in some cases, I get the desired view I want, a full article in Firefox reader view.
but using node js, with JSDOM, resulting in only part of it.

this is the article I'm using:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6540991/

Firefox is parsing it all, from top to bottom.
Using node-js with this library, I get only the 'References' part of the article.
Are there any more configurations that I should consider?

this is my code:

    const jsd = new JSDOM(doc,{
      url,
      referrer: url,
      contentType: "text/html",
      includeNodeLocations: true,
      pretendToBeVisual: true,
    })

    const readable = new Readability(jsd .window?.document, {debug: true }).parse()

using ff reading view:
image

using the readability library:
image

Thanks!

Just a note since i had a similar issue:
If you refresh your example page in ff with reader mode already active, the output for the content is the same as in your second screenshot.

My issue for a different site was that certain elements were moved by JS after initial page load, when activating reader mode ff used the currently available HTML.
When loading the page with reader mode already active, ff used the HTML as provided by the initial request and scripts were never executed.

@Greygooo Yeah, you are right. After a refresh, it does look the same. It might be as a result of JS changes after the page is loaded. I'll investigate in this direction!
Thanks!