mozilla / readability

A standalone version of the readability lib

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reader mode cuts off the main content

visla-alex opened this issue · comments

commented

See https://www.aetherius.org/ufos-and-extraterrestrial-life/?param1=GA&gclid=EAIaIQobChMI9cKrg7aTggMVxACtBh1p-gDvEAAYASAAEgIX6_D_BwE

Reader mode cuts off the first half of the page. I have also tried using readabilityJS in NodeJS server and getting the same result as reader mode.

Reproduced on Firefox 119.0 (64-bit) on macOS Ventura 13.5.1

I'm experiencing the same issue.

I am experiencing a similar issue.

In my case, the problem is that both the readability library and reader mode in the Mozilla browser return the longest plausible content object. If a page has a header with a content length greater than the actual article content length, then both readability and reader mode return the article header as the main content.

This page demonstrates a blogger page that is correctly parsed

This page demonstrates a problematic blogger page. Here the article header is of greater length than the actual content

It is not surprising that the process by which readability determines the correct article content is fairly involved and it is not trivial to determine where the logic fails for my example page.

I will continue looking over the _grabArtcile() method and see if anything jumps out.