karussell / snacktory

Readability clone in Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ignore hidden items?

sjpotter opened this issue · comments

It seems when I do a snacktory pass on amazon pages (ex: http://www.amazon.com/Vandaveer-Software-Brick-Buster-Pro/dp/B006T4IJTK)

its extracting hidden data, which is obviously not that relevant from a readability perspective. I was a bit confused when looking at the output, as a simple find via a browser didnt see the same text. when i looked at the source I realized that it was using display: hidden.

I realize in some cases it might be beyond the scope of readability (if its not really approaching it from a full DOM perspective), but it would seem in some cases (such as here), it should be more obvious that these nodes should be excluded

Although snacktory has the DOM in memory it does NOT evaluate the css files or rules of inlined css.

Only if the css is directly attached to the DOM it could be easily fixed. All other cases are a big challenge.

understood, the area that it was picking out for the URL I gave above
seems to be (attached below)

which should seemingly be obvious to not be used (display: none) (but
could always be wrong).

Children (Ages 9 and Younger)
This application is designed specifically for children and contains no objectionable content.

All Ages
The content of this application is appropriate for all ages and contains no objectionable content.

Ages 9 and Older
The content of this application may contain infrequent examples of mild violence or mild language.

May Contain:
• Infrequent use of mild profanity or crude humor
• Cartoon violence or mild realistic violence

Ages 13 and Older
The content of this application may contain references to alcohol, tobacco and mild realistic violence, mild language, or sexually suggestive themes. It may also contain nudity within medical/informational or artistic contexts.

May Contain:
• References to alcohol, tobacco and drugs
• Simulated gambling or references to casinos and gambling culture
• Partial or brief nudity in a non-sexual context
• Infrequent use of mild profanity and crude humor
• Sexually suggestive themes
• Slapstick or cartoon violence

Ages 17 and Older
The content of this application may contain frequent examples of strong violence and strong language as well as gambling, alcohol, tobacco, drugs, sexually-suggestive themes as well as partial or brief nudity in a non-sexual context.

May Contain:
• Explicit references to or images of drugs, alcohol, tobacco
• Simulated gambling or references to casinos and gambling culture
• Partial or brief nudity in a non-sexual context
• Frequent crude humor
• Sexually-suggestive themes
• Graphic or realistic violence

On 04/19/2012 02:18 PM, Peter wrote:

Although snacktory has the DOM in memory it does evaluate the css files or rules of inlined css.

Only if the css is directly attached to the DOM it could be easily fixed. All other cases are a challenge.


Reply to this email directly or view it on GitHub:
#8 (comment)