ignore hidden items?
sjpotter opened this issue · comments
It seems when I do a snacktory pass on amazon pages (ex: http://www.amazon.com/Vandaveer-Software-Brick-Buster-Pro/dp/B006T4IJTK)
its extracting hidden data, which is obviously not that relevant from a readability perspective. I was a bit confused when looking at the output, as a simple find via a browser didnt see the same text. when i looked at the source I realized that it was using display: hidden.
I realize in some cases it might be beyond the scope of readability (if its not really approaching it from a full DOM perspective), but it would seem in some cases (such as here), it should be more obvious that these nodes should be excluded
Although snacktory has the DOM in memory it does NOT evaluate the css files or rules of inlined css.
Only if the css is directly attached to the DOM it could be easily fixed. All other cases are a big challenge.
understood, the area that it was picking out for the URL I gave above
seems to be (attached below)
which should seemingly be obvious to not be used (display: none) (but
could always be wrong).
• Cartoon violence or mild realistic violence
• Simulated gambling or references to casinos and gambling culture
• Partial or brief nudity in a non-sexual context
• Infrequent use of mild profanity and crude humor
• Sexually suggestive themes
• Slapstick or cartoon violence
• Simulated gambling or references to casinos and gambling culture
• Partial or brief nudity in a non-sexual context
• Frequent crude humor
• Sexually-suggestive themes
• Graphic or realistic violence
On 04/19/2012 02:18 PM, Peter wrote:
Although snacktory has the DOM in memory it does evaluate the css files or rules of inlined css.
Only if the css is directly attached to the DOM it could be easily fixed. All other cases are a challenge.
Reply to this email directly or view it on GitHub:
#8 (comment)