erdemolkun / plist

Automatically exported from code.google.com/p/plist

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems with XML parsing when offline and or on Android devices

GoogleCodeExporter opened this issue · comments

What steps will reproduce the problem?
1. Run:
NSDictionary testDict = (NSDictionary)PropertyListParser.parse(file);

where file is the attached temp.pst and contains: 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" 
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>id</key>
        <string>foo</string>
        <key>description</key>
        <string>bar</string>
</dict>
</plist>


What is the expected output? What do you see instead?

Exepected is a dictionary object with two key value pairs.

Instead, XMLPropertyListParser parseObject (line 86) interprets the first child 
node type as '#text' and as a result returns a null object.  


What version of the product are you using? On what operating system?
The latest source as of 8/3/12

Original issue reported on code.google.com by twigbra...@gmail.com on 3 Aug 2011 at 11:12

Attachments:

I cannot reproduce this bug - your input parses just fine for me (@ revision 
26).

Original comment by kei...@alum.mit.edu on 4 Aug 2011 at 2:04

Thanks for checking. I continue to get the same response.  I edited my original 
file to create the test- perhaps something was changed unintentionally. 

Did you try the file? or copy and paste the text? 

Any idea why the parser would interpret the first child as type '#text'? 

Original comment by twigbra...@gmail.com on 4 Aug 2011 at 2:55

[deleted comment]
I took a closer look at the results with the debugger.  The first child node 
I'm getting is #text, and the string is '\n'.  The next node is a 'dict' 
element, and the last node, #text of '\n'.

Original comment by twigbra...@gmail.com on 4 Aug 2011 at 3:31

[deleted comment]
That is strange, usually whitespaces should be ignored and not recognized as 
#text nodes.

See:

XMLPropertyList.java (lines 42 & 74)
  docBuilderFactory.setIgnoringElementContentWhitespace(true);

Which revision do you use?

Original comment by daniel.dreibrodt on 4 Aug 2011 at 8:54

I'm using revision 26.

According to the java docs that setting only works on "element only content" 
models and parsers where validating is set to true. 

http://download.oracle.com/javase/1.5.0/docs/api/javax/xml/parsers/DocumentBuild
erFactory.html#setValidating%28boolean%29

The Android docs don't mention this, so maybe it doesn't apply?  If it does 
apply, that may explain why those lines have no effect.

Original comment by twigbra...@gmail.com on 4 Aug 2011 at 3:47

I also read that, but the parsing works for me, no matter how many line-breaks 
I insert. Which system are you on? Could you share the complete code you use 
and the original file you try to parse?

Original comment by daniel.dreibrodt on 4 Aug 2011 at 3:52

Perhaps this applies?
http://stackoverflow.com/questions/229310/how-to-ignore-whitespace-while-reading
-a-file-to-produce-an-xml-dom

Original comment by twigbra...@gmail.com on 4 Aug 2011 at 4:00

[deleted comment]
I'm using OSX 10.6.7.  Eclipse 3.5.2.  SDK Platform 3.1 (API 12). Build target 
2.3.3. Test device Nexus One with 2.3.4.

I verified the file I posted above fails.  Within the plist element there's a 
dict element sandwiched by a single '\n' (hex 0A).

The only other code is how the file is instantiated:
File file = new File(filenameWithPath);
NSDictionary testDict = (NSDictionary)PropertyListParser.parse(file);

Original comment by twigbra...@gmail.com on 4 Aug 2011 at 4:56

I'm not entirely sure this is related, but I was trying to make the plist 
library work offline by not loading the DTD.  I added

     docBuilderFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

and with that I get the error you do (#text whitespace only nodes returned by 
the parser).  Perhaps your xml parser isn't loading the DTD?

Original comment by kei...@alum.mit.edu on 4 Aug 2011 at 6:14

Sounds like it could be related. I'll look into it.  

FWIW, I also hope this works offline.

Original comment by twigbra...@gmail.com on 4 Aug 2011 at 6:26

Ok, so you're testing it on the Android simulator which afaik does not yet 
simulate an Internet connection. Thus DTD loading fails and whitespace is not 
ignored anymore.
So what we have to do is make the parser work offline.

Original comment by daniel.dreibrodt on 4 Aug 2011 at 7:28

  • Changed state: Accepted
  • Added labels: OpSys-All, Usability
Just to clarify, I'm using the test device I mention in comment 11- Nexus One 
with OS 2.3.4.  I am connecting to the Internet, but not sure if it's looking 
up the DTD yet.   

Based on comment 12, I wouldn't be surprised if the DTD lookup is failing, or 
isn't being attempted. 


Original comment by twigbra...@gmail.com on 4 Aug 2011 at 7:35

Should be fixed in r32. All #TEXT nodes between elements are skipped. Offline 
parsing should now be possible w/o problems.

Original comment by daniel.dreibrodt on 4 Aug 2011 at 8:28

  • Changed state: Started
Oups, r32 leads to nasty endless loops. Please wait for r33.

Original comment by daniel.dreibrodt on 4 Aug 2011 at 8:44

r33 commited. Works for me. But I noticed that the performance is lower because 
of all the #TEXT skipping. That needs to be improved.

Original comment by daniel.dreibrodt on 4 Aug 2011 at 9:01

Got an error due to line 73 in XMLPropertyListParser, where 
http://apache.org/xml/features/nonvalidating/load-external-dtd is set.  If I 
comment it out, it works.  Perhaps this is not supported in Android?

08-04 14:03:14.290: WARN/System.err(4201): 
javax.xml.parsers.ParserConfigurationException: 
http://apache.org/xml/features/nonvalidating/load-external-dtd
08-04 14:03:14.330: WARN/System.err(4201):     at 
org.apache.harmony.xml.parsers.DocumentBuilderFactoryImpl.setFeature(DocumentBui
lderFactoryImpl.java:101)
08-04 14:03:14.330: WARN/System.err(4201):     at 
com.dd.plist.XMLPropertyListParser.parse(XMLPropertyListParser.java:73)
08-04 14:03:14.330: WARN/System.err(4201):     at 
com.dd.plist.XMLPropertyListParser.parse(XMLPropertyListParser.java:61)
08-04 14:03:14.340: WARN/System.err(4201):     at 
com.dd.plist.PropertyListParser.parse(PropertyListParser.java:78)
08-04 14:03:14.340: WARN/System.err(4201):     at 
com.dd.plist.PropertyListParser.parse(PropertyListParser.java:91)
08-04 14:03:14.340: WARN/System.err(4201):     at 
com.dd.plist.PropertyListParser.parse(PropertyListParser.java:64)

Original comment by twigbra...@gmail.com on 4 Aug 2011 at 9:29

Here's another option for offline parsing, including the DTD in the jar file.  
Try it and let me know if it works for you.

I'm not sure if it is kosher copyrightwise to include Apple's DTD, but there 
isn't a copyright notice on it and I don't think Apple would care much even if 
it did.

Original comment by kei...@alum.mit.edu on 5 Aug 2011 at 1:20

Attachments:

I also thought about that but was worried about copyright. But I'll send an 
email to Apple Dev Support and ask them about their stand on it.

Original comment by daniel.dreibrodt on 5 Aug 2011 at 7:22

The performance of my solution with skipping the #TEXT nodes is really bad. I 
compared r29 vs. r34 on my 20 MB iTunes library. 

r29 needed 5,5 s
r34 needed 32,3 s

This is not bearable.

Original comment by daniel.dreibrodt on 5 Aug 2011 at 7:58

Oh forget what I just said. The huge performance difference does not come from 
the #TEXT skipping but from changing the how files are parsed. Passing it to 
the parser as a FileInputStream rather than a file is the cause of the 
performance drop.

Original comment by daniel.dreibrodt on 5 Aug 2011 at 8:15

Perhaps wrapping FileInputStream with BufferedInputStream might speed things up?

Original comment by twigbra...@gmail.com on 5 Aug 2011 at 9:01

I fixed the performance issues already. I'm currently working on separating the 
parsing process into online (DTD available) and offline (DTD not available).

Original comment by daniel.dreibrodt on 5 Aug 2011 at 9:06

Alex, could you please test r37 on your Android device? Also check how it works 
in airplane mode (no Internet).

Original comment by daniel.dreibrodt on 5 Aug 2011 at 9:17

Getting the following error with r37:

08-05 11:04:07.790: WARN/System.err(23739): 
javax.xml.parsers.ParserConfigurationException: No validating DocumentBuilder 
implementation available
08-05 11:04:07.800: WARN/System.err(23739):     at 
org.apache.harmony.xml.parsers.DocumentBuilderFactoryImpl.newDocumentBuilder(Doc
umentBuilderFactoryImpl.java:61)
08-05 11:04:07.800: WARN/System.err(23739):     at 
com.dd.plist.XMLPropertyListParser.initDocBuilder(XMLPropertyListParser.java:84)
08-05 11:04:07.800: WARN/System.err(23739):     at 
com.dd.plist.XMLPropertyListParser.parse(XMLPropertyListParser.java:96)
08-05 11:04:07.800: WARN/System.err(23739):     at 
com.dd.plist.PropertyListParser.parse(PropertyListParser.java:69)


Rearranging (XMLProperty... : line 84) like this works for me (Android 2.3.4 on 
Nexus One - can't speak for the others):

        if(System.getProperty("java.vendor").toLowerCase().contains("android")) {
            //Is there an error if the DTD could not be loaded as on desktop VMs?       
            docBuilderFactory.setValidating(false);
            skipTextNodes = true;
        } else {
            if(offline) {
                //Strangely this does not work on Android (Tested on Nexus One with Android 2.3.4)
                docBuilderFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
                docBuilderFactory.setValidating(false);
                skipTextNodes = true;
            } else {         
                docBuilderFactory.setValidating(true);
                docBuilderFactory.setIgnoringElementContentWhitespace(true);
                skipTextNodes = false;          
            }
        }  


Original comment by twigbra...@gmail.com on 5 Aug 2011 at 6:37

Does this error come up both when you are connected to the Internet and when 
not?

Original comment by daniel.dreibrodt on 6 Aug 2011 at 8:23

Original comment by daniel.dreibrodt on 6 Aug 2011 at 11:29

  • Changed title: Problems with XML parsing when offline and or on Android devices
  • Removed labels: OpSys-All
Here's a patch for another take on avoiding network traffic.  We just patch in 
an empty DTD and filter out the #text nodes.  Could someone try this on the 
Android simulator and see if it works?

Original comment by kei...@alum.mit.edu on 23 Aug 2011 at 11:31

Attachments:

Regarding your patch changing the public ID:
In 2007 "Apple Computer Inc." was renamed to "Apple Inc.".
Thus I would guess the "-//Apple Computer//..." version is the outdated one.
The other one, just with "-//Apple//..." is definitely correct and also used in 
recent Apple documents. See https://support.apple.com/kb/HT3765.
Also Apple's Property List Editor generates XML property lists with the 
"-//Apple//..." public id.

Original comment by daniel.dreibrodt on 24 Aug 2011 at 7:09

Ah, I didn't realize there were two versions.  I'll revert that part of the 
change.

Original comment by keith.ra...@gmail.com on 24 Aug 2011 at 4:03

I committed my modified changes.  Now the library should never request the DTD.

Original comment by kei...@alum.mit.edu on 14 Sep 2011 at 11:48

  • Changed state: Fixed