w3c / epubcheck

The conformance checker for EPUB publications

Home Page:https://www.w3.org/publishing/epubcheck/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Null exception on checking data URL - EpubCheck 5.0.1 and 5.1.0

rachanak-dk opened this issue · comments

Hi,

When I check a fixed layout epub using the EpubCheck 5.0.1 or EpubCheck 5.1.0, I get the following. No errors on EpubCheck 5.0.0 and earlier versions. OPF file is also validated and has no resource path issues:

java.lang.NullPointerException: null input
at io.mola.galimatias.URLParser.parse(URLParser.java:215)
at io.mola.galimatias.URL.withPath(URL.java:397)
at io.mola.galimatias.canonicalize.DecodeUnreservedCanonicalizer.canonicalize(DecodeUnreservedCanonicalizer.java:41)
at org.w3c.epubcheck.util.url.URLUtils.normalize(URLUtils.java:188)
at com.adobe.epubcheck.ocf.OCFContainer.contains(OCFContainer.java:88)
at com.adobe.epubcheck.ocf.OCFContainer.isRemote(OCFContainer.java:134)
at org.w3c.epubcheck.core.references.ResourceReferencesChecker.checkReference(ResourceReferencesChecker.java:120)
at org.w3c.epubcheck.core.references.ResourceReferencesChecker.check(ResourceReferencesChecker.java:102)
at com.adobe.epubcheck.opf.OPFChecker.checkPackage(OPFChecker.java:149)
at com.adobe.epubcheck.opf.OPFChecker30.checkPackage(OPFChecker30.java:67)
at com.adobe.epubcheck.opf.OPFChecker.check(OPFChecker.java:94)
at com.adobe.epubcheck.ocf.OCFChecker.check(OCFChecker.java:173)
at com.adobe.epubcheck.api.EpubCheck.doValidate(EpubCheck.java:218)
at com.adobe.epubcheck.tool.EpubChecker.validateFile(EpubChecker.java:250)
at com.adobe.epubcheck.tool.EpubChecker.processFile(EpubChecker.java:325)
at com.adobe.epubcheck.tool.EpubChecker.run(EpubChecker.java:150)
at com.adobe.epubcheck.tool.Checker.main(Checker.java:31)

commented

I can confirm the issue. I also get:

java.lang.NullPointerException: null input
        at io.mola.galimatias.URLParser.parse(URLParser.java:215)
        at io.mola.galimatias.URL.withPath(URL.java:397)
        at io.mola.galimatias.canonicalize.DecodeUnreservedCanonicalizer.canonicalize(DecodeUnreservedCanonicalizer.java:41)
        at org.w3c.epubcheck.util.url.URLUtils.normalize(URLUtils.java:188)
        at com.adobe.epubcheck.ocf.OCFContainer.contains(OCFContainer.java:88)
        at com.adobe.epubcheck.ocf.OCFContainer.isRemote(OCFContainer.java:134)
        at org.w3c.epubcheck.core.references.ResourceReferencesChecker.checkReference(ResourceReferencesChecker.java:120)
        at org.w3c.epubcheck.core.references.ResourceReferencesChecker.check(ResourceReferencesChecker.java:102)
        at com.adobe.epubcheck.opf.OPFChecker.checkPackage(OPFChecker.java:149)
        at com.adobe.epubcheck.opf.OPFChecker30.checkPackage(OPFChecker30.java:67)
        at com.adobe.epubcheck.opf.OPFChecker.check(OPFChecker.java:94)
        at com.adobe.epubcheck.ocf.OCFChecker.check(OCFChecker.java:173)
        at com.adobe.epubcheck.api.EpubCheck.doValidate(EpubCheck.java:218)
        at com.adobe.epubcheck.tool.EpubChecker.validateFile(EpubChecker.java:250)
        at com.adobe.epubcheck.tool.EpubChecker.processFile(EpubChecker.java:325)
        at com.adobe.epubcheck.tool.EpubChecker.run(EpubChecker.java:150)
        at com.adobe.epubcheck.tool.Checker.main(Checker.java:31)

Repoduce with attached sample EPUB:
nullpointer_epub.zip

Repoduce with attached sample EPUB:

Looks like the data url in the background-image declaration in the Style.css file is causing it.

Repoduce with attached sample EPUB:

Looks like the data url in the background-image declaration in the Style.css file is causing it.

@mattgarrish: Thank you for your response. We are checking the URL pointed out by you which indeed is the one causing the issue.
Thanks

Reopening this until @rdeltour has a chance to look into it. It seems to be symptomatic of a bigger issue with them, as you shouldn't get a null exception error. Even if I put the data URL into an img tag, or remove the fxl metadata, I get the exception.

Off the top of my head, the only requirement on their use is that they define a core media type, which appears to be the case here.

commented

As far as I can tell this affects 100% of our fxl files which now has Overdrive bouncing them. I am trying to figure out why the original producers bothered to add the background as it seems to be 1 pixel dot;

.trn_link {
	background-image:url('??AEAOw==');
	position:absolute;
}

1 pixel images are often used for clickable links that do not wrap any text. It seems that the Data URL is problematic. If you replace it with  it works fine.

Can confirm all our FXL getting same null pointer exception.

Failure case:
??AEAOw==

When replaced with the URL Titusz gave above, no issues.

File is ANCIENT and produced by some old supplier. Offhand I don't actually know what that string after base64 actually means/does!

Ya, I think I see what's going on. Per the data URL syntax, characters outside the safe range of URL characters have to be escaped. If you percent encode the question marks in the original then you no longer get a null exception from epubcheck.

Of course, epubcheck should report the URL as invalid, not throw a null exception error, so that still needs fixing.

(It also doesn't look like the original data URL is properly base64 encoded, as even after fixing the problem I still get an invalid image.)

+1 on @mattgarrish's analysis: the URL looks non-conforming so should be reported, but EPUBCheck should definitely not throw a NullPointerException in that case. I'll look into it for the next milestone. Thanks all for the report!

The latest ePUBChecker 5.1.0 ignores the Thumbs.db error. The availability of Thumbs.db inside the image folder needs to be checked manually before hosting or delivering the ePUB files to clients.

The below issues is also in the earlier version 5.0.1.

The latest ePUBChecker 5.1.0 ignores the Thumbs.db error. The availability of Thumbs.db inside the image folder needs to be checked manually before hosting or delivering the ePUB files to clients.