Relevant content in XML island is not returned
zivk opened this issue · comments
When the relevant article content is in an XML island it wouldn't be returned. See for example WSJ Japan article http://jp.wsj.com/Finance-Markets/Foreign-Currency-Markets/node_400108 with the following fragment (shortened for clarity):
<p>
<?xml version="1.0" encoding="utf-8"?>
<section xmlns:image="http://ez.no/namespaces/ezpublish3/image/" ...>
<paragraph>(this is the relevant content) イスラエル銀行(**銀行)は景気下支えを目的に過去5カ月間に ...</paragraph>
</section>
</p>
This should be fixed. But we need a test case to close this here ..