karussell / snacktory

Readability clone in Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Relevant content in XML island is not returned

zivk opened this issue · comments

commented

When the relevant article content is in an XML island it wouldn't be returned. See for example WSJ Japan article http://jp.wsj.com/Finance-Markets/Foreign-Currency-Markets/node_400108 with the following fragment (shortened for clarity):

<p>
<?xml version="1.0" encoding="utf-8"?>
<section xmlns:image="http://ez.no/namespaces/ezpublish3/image/" ...>
<paragraph>(this is the relevant content) イスラエル銀行(**銀行)は景気下支えを目的に過去5カ月間に ...</paragraph>
</section>
</p>

This should be fixed. But we need a test case to close this here ..