Problems with XML parsing when offline and or on Android devices
GoogleCodeExporter opened this issue · comments
What steps will reproduce the problem?
1. Run:
NSDictionary testDict = (NSDictionary)PropertyListParser.parse(file);
where file is the attached temp.pst and contains:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>id</key>
<string>foo</string>
<key>description</key>
<string>bar</string>
</dict>
</plist>
What is the expected output? What do you see instead?
Exepected is a dictionary object with two key value pairs.
Instead, XMLPropertyListParser parseObject (line 86) interprets the first child
node type as '#text' and as a result returns a null object.
What version of the product are you using? On what operating system?
The latest source as of 8/3/12
Original issue reported on code.google.com by twigbra...@gmail.com
on 3 Aug 2011 at 11:12
Attachments:
I cannot reproduce this bug - your input parses just fine for me (@ revision
26).
Original comment by kei...@alum.mit.edu
on 4 Aug 2011 at 2:04
Thanks for checking. I continue to get the same response. I edited my original
file to create the test- perhaps something was changed unintentionally.
Did you try the file? or copy and paste the text?
Any idea why the parser would interpret the first child as type '#text'?
Original comment by twigbra...@gmail.com
on 4 Aug 2011 at 2:55
[deleted comment]
I took a closer look at the results with the debugger. The first child node
I'm getting is #text, and the string is '\n'. The next node is a 'dict'
element, and the last node, #text of '\n'.
Original comment by twigbra...@gmail.com
on 4 Aug 2011 at 3:31
[deleted comment]
That is strange, usually whitespaces should be ignored and not recognized as
#text nodes.
See:
XMLPropertyList.java (lines 42 & 74)
docBuilderFactory.setIgnoringElementContentWhitespace(true);
Which revision do you use?
Original comment by daniel.dreibrodt
on 4 Aug 2011 at 8:54
I'm using revision 26.
According to the java docs that setting only works on "element only content"
models and parsers where validating is set to true.
http://download.oracle.com/javase/1.5.0/docs/api/javax/xml/parsers/DocumentBuild
erFactory.html#setValidating%28boolean%29
The Android docs don't mention this, so maybe it doesn't apply? If it does
apply, that may explain why those lines have no effect.
Original comment by twigbra...@gmail.com
on 4 Aug 2011 at 3:47
I also read that, but the parsing works for me, no matter how many line-breaks
I insert. Which system are you on? Could you share the complete code you use
and the original file you try to parse?
Original comment by daniel.dreibrodt
on 4 Aug 2011 at 3:52
Perhaps this applies?
http://stackoverflow.com/questions/229310/how-to-ignore-whitespace-while-reading
-a-file-to-produce-an-xml-dom
Original comment by twigbra...@gmail.com
on 4 Aug 2011 at 4:00
[deleted comment]
I'm using OSX 10.6.7. Eclipse 3.5.2. SDK Platform 3.1 (API 12). Build target
2.3.3. Test device Nexus One with 2.3.4.
I verified the file I posted above fails. Within the plist element there's a
dict element sandwiched by a single '\n' (hex 0A).
The only other code is how the file is instantiated:
File file = new File(filenameWithPath);
NSDictionary testDict = (NSDictionary)PropertyListParser.parse(file);
Original comment by twigbra...@gmail.com
on 4 Aug 2011 at 4:56
I'm not entirely sure this is related, but I was trying to make the plist
library work offline by not loading the DTD. I added
docBuilderFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
and with that I get the error you do (#text whitespace only nodes returned by
the parser). Perhaps your xml parser isn't loading the DTD?
Original comment by kei...@alum.mit.edu
on 4 Aug 2011 at 6:14
Sounds like it could be related. I'll look into it.
FWIW, I also hope this works offline.
Original comment by twigbra...@gmail.com
on 4 Aug 2011 at 6:26
Ok, so you're testing it on the Android simulator which afaik does not yet
simulate an Internet connection. Thus DTD loading fails and whitespace is not
ignored anymore.
So what we have to do is make the parser work offline.
Original comment by daniel.dreibrodt
on 4 Aug 2011 at 7:28
- Changed state: Accepted
- Added labels: OpSys-All, Usability
Just to clarify, I'm using the test device I mention in comment 11- Nexus One
with OS 2.3.4. I am connecting to the Internet, but not sure if it's looking
up the DTD yet.
Based on comment 12, I wouldn't be surprised if the DTD lookup is failing, or
isn't being attempted.
Original comment by twigbra...@gmail.com
on 4 Aug 2011 at 7:35
Should be fixed in r32. All #TEXT nodes between elements are skipped. Offline
parsing should now be possible w/o problems.
Original comment by daniel.dreibrodt
on 4 Aug 2011 at 8:28
- Changed state: Started
Oups, r32 leads to nasty endless loops. Please wait for r33.
Original comment by daniel.dreibrodt
on 4 Aug 2011 at 8:44
r33 commited. Works for me. But I noticed that the performance is lower because
of all the #TEXT skipping. That needs to be improved.
Original comment by daniel.dreibrodt
on 4 Aug 2011 at 9:01
Got an error due to line 73 in XMLPropertyListParser, where
http://apache.org/xml/features/nonvalidating/load-external-dtd is set. If I
comment it out, it works. Perhaps this is not supported in Android?
08-04 14:03:14.290: WARN/System.err(4201):
javax.xml.parsers.ParserConfigurationException:
http://apache.org/xml/features/nonvalidating/load-external-dtd
08-04 14:03:14.330: WARN/System.err(4201): at
org.apache.harmony.xml.parsers.DocumentBuilderFactoryImpl.setFeature(DocumentBui
lderFactoryImpl.java:101)
08-04 14:03:14.330: WARN/System.err(4201): at
com.dd.plist.XMLPropertyListParser.parse(XMLPropertyListParser.java:73)
08-04 14:03:14.330: WARN/System.err(4201): at
com.dd.plist.XMLPropertyListParser.parse(XMLPropertyListParser.java:61)
08-04 14:03:14.340: WARN/System.err(4201): at
com.dd.plist.PropertyListParser.parse(PropertyListParser.java:78)
08-04 14:03:14.340: WARN/System.err(4201): at
com.dd.plist.PropertyListParser.parse(PropertyListParser.java:91)
08-04 14:03:14.340: WARN/System.err(4201): at
com.dd.plist.PropertyListParser.parse(PropertyListParser.java:64)
Original comment by twigbra...@gmail.com
on 4 Aug 2011 at 9:29
Here's another option for offline parsing, including the DTD in the jar file.
Try it and let me know if it works for you.
I'm not sure if it is kosher copyrightwise to include Apple's DTD, but there
isn't a copyright notice on it and I don't think Apple would care much even if
it did.
Original comment by kei...@alum.mit.edu
on 5 Aug 2011 at 1:20
Attachments:
I also thought about that but was worried about copyright. But I'll send an
email to Apple Dev Support and ask them about their stand on it.
Original comment by daniel.dreibrodt
on 5 Aug 2011 at 7:22
The performance of my solution with skipping the #TEXT nodes is really bad. I
compared r29 vs. r34 on my 20 MB iTunes library.
r29 needed 5,5 s
r34 needed 32,3 s
This is not bearable.
Original comment by daniel.dreibrodt
on 5 Aug 2011 at 7:58
Oh forget what I just said. The huge performance difference does not come from
the #TEXT skipping but from changing the how files are parsed. Passing it to
the parser as a FileInputStream rather than a file is the cause of the
performance drop.
Original comment by daniel.dreibrodt
on 5 Aug 2011 at 8:15
Perhaps wrapping FileInputStream with BufferedInputStream might speed things up?
Original comment by twigbra...@gmail.com
on 5 Aug 2011 at 9:01
I fixed the performance issues already. I'm currently working on separating the
parsing process into online (DTD available) and offline (DTD not available).
Original comment by daniel.dreibrodt
on 5 Aug 2011 at 9:06
Alex, could you please test r37 on your Android device? Also check how it works
in airplane mode (no Internet).
Original comment by daniel.dreibrodt
on 5 Aug 2011 at 9:17
Getting the following error with r37:
08-05 11:04:07.790: WARN/System.err(23739):
javax.xml.parsers.ParserConfigurationException: No validating DocumentBuilder
implementation available
08-05 11:04:07.800: WARN/System.err(23739): at
org.apache.harmony.xml.parsers.DocumentBuilderFactoryImpl.newDocumentBuilder(Doc
umentBuilderFactoryImpl.java:61)
08-05 11:04:07.800: WARN/System.err(23739): at
com.dd.plist.XMLPropertyListParser.initDocBuilder(XMLPropertyListParser.java:84)
08-05 11:04:07.800: WARN/System.err(23739): at
com.dd.plist.XMLPropertyListParser.parse(XMLPropertyListParser.java:96)
08-05 11:04:07.800: WARN/System.err(23739): at
com.dd.plist.PropertyListParser.parse(PropertyListParser.java:69)
Rearranging (XMLProperty... : line 84) like this works for me (Android 2.3.4 on
Nexus One - can't speak for the others):
if(System.getProperty("java.vendor").toLowerCase().contains("android")) {
//Is there an error if the DTD could not be loaded as on desktop VMs?
docBuilderFactory.setValidating(false);
skipTextNodes = true;
} else {
if(offline) {
//Strangely this does not work on Android (Tested on Nexus One with Android 2.3.4)
docBuilderFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
docBuilderFactory.setValidating(false);
skipTextNodes = true;
} else {
docBuilderFactory.setValidating(true);
docBuilderFactory.setIgnoringElementContentWhitespace(true);
skipTextNodes = false;
}
}
Original comment by twigbra...@gmail.com
on 5 Aug 2011 at 6:37
Does this error come up both when you are connected to the Internet and when
not?
Original comment by daniel.dreibrodt
on 6 Aug 2011 at 8:23
Original comment by daniel.dreibrodt
on 6 Aug 2011 at 11:29
- Changed title: Problems with XML parsing when offline and or on Android devices
- Removed labels: OpSys-All
Here's a patch for another take on avoiding network traffic. We just patch in
an empty DTD and filter out the #text nodes. Could someone try this on the
Android simulator and see if it works?
Original comment by kei...@alum.mit.edu
on 23 Aug 2011 at 11:31
Attachments:
Regarding your patch changing the public ID:
In 2007 "Apple Computer Inc." was renamed to "Apple Inc.".
Thus I would guess the "-//Apple Computer//..." version is the outdated one.
The other one, just with "-//Apple//..." is definitely correct and also used in
recent Apple documents. See https://support.apple.com/kb/HT3765.
Also Apple's Property List Editor generates XML property lists with the
"-//Apple//..." public id.
Original comment by daniel.dreibrodt
on 24 Aug 2011 at 7:09
Ah, I didn't realize there were two versions. I'll revert that part of the
change.
Original comment by keith.ra...@gmail.com
on 24 Aug 2011 at 4:03
I committed my modified changes. Now the library should never request the DTD.
Original comment by kei...@alum.mit.edu
on 14 Sep 2011 at 11:48
- Changed state: Fixed