ahorn / android-rss

Lightweight Android library to parse RSS 2.0 feeds.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Not well formed XML" error with ISO-8859-1 encoding and accented characters

zipgenius opened this issue · comments

Hello.
I'm trying to parse a feed from my website, which is encoded in ISO-8859-1 and uses accented characters (which are very common in Italian language). The parser throw an exception and the application is crashing.

This is the URL of my feed: http://forum.wininizio.it/index.php/rss/blog/

Please, can you help me in order to get it to work?

Thanks for this great piece of code! :)

Could you please post the error trace that you get. The SAX parser is instantiated in the RSSParser class [1]. I would encourage you to take a peek at the code and content encodings with SAX because I am currently away on an extended trip. Once I am back we can implement the fix. Alternatively, you may submit a "pull request" and I'll merge back your patch.

[1] https://github.com/ahorn/android-rss/blob/master/src/main/java/org/mcsoxford/rss/RSSParser.java

WOW! Got it in 5 minutes :)

Here we go with the code. In RSSParser.java, around line #77, we have the
following:

private RSSFeed parse(SAXParser parser, InputStream feed)
throws SAXException, IOException {
if (parser == null) {
throw new IllegalArgumentException("RSS parser must not be null.");
} else if (feed == null) {
throw new IllegalArgumentException("RSS feed must not be null.");
}

// SAX automatically detects the correct character encoding from the

stream
// See also http://www.w3.org/TR/REC-xml/#sec-guessing
final InputSource source = new InputSource(feed);
final XMLReader xmlreader = parser.getXMLReader();
final RSSHandler handler = new RSSHandler(config);

xmlreader.setContentHandler(handler);
xmlreader.parse(source);

return handler.feed();

}

I just added the following line before xmlreader.setContentHandler(handler):

source.setEncoding("ISO-8859-1");

et voil: I got my feed working fine :)

Now, let's see how to implement some form of detection of the encoding to
force in...

Matteo Riso
2011/5/16 ahorn <
reply@reply.github.com>

Could you please post the error trace that you get. The SAX parser is
instantiated in the RSSParser class [1]. I would encourage you to take a
peek at the code and content encodings with SAX because I am currently away
on an extended trip. Once I am back we can implement the fix. Alternatively,
you may submit a "pull request" and I'll merge back your patch.

[1]
https://github.com/ahorn/android-rss/blob/master/src/main/java/org/mcsoxford/rss/RSSParser.java

Reply to this email directly or view it on GitHub:
#4 (comment)

I've written a short unit test and it passes as part of the Maven build. Therefore, this bug may be specific to the version of Android you are using. Could you please clone the repository and add a functional test case (see [1]). Once we can reproduce the error with an automated test we can discuss ways how to fix it.

Cheers,
Alex

[1] https://github.com/ahorn/android-rss/blob/master/src/test/java/org/mcsoxford/rss/RSSParserTest.java

In case anyone else happens on this issue, I've gotten a similar error when the feed itself specifies ISO-8859-1 as the encoding, but the server sends the data without a Content-Encoding header or one that is set to the wrong value.

commented

Hi Josh, if you are up to it, perhaps we can start with a unit test that reproduces the problem locally and we can go from there.

i am having the "Not well formed XML" but with a javascript tag, how can i ignore the <script> tag? where should look to fix it?