Spine items with URL-encoded hrefs are not handled correctly
kevinboone opened this issue · comments
Kevin Boone commented
Although unusual, it's legitimate for the XHTML documents in an EPUB to have filenames containing whitespace and punctuation characters. When these files are referenced in the manifest/spine in content.opf, they should be URL-encoded. Often this isn't the case but, when it is, epub2txt fails because it doesn't decode the URL. So if we have
<item href="foo%20bar.xhtml"/>
the program ends up looking for a file that is actually called "foo%20bar.xhtml" instead of decoding it to "foo bar.xhtml".