Missing last chapter
amjith opened this issue · comments
I borrowed The Hitchhiker's Guide to the Galaxy
and downloaded the mp3 files. There are 5 chapters in the odm file (copied below) but the script only downloaded 4 files. I suspect it is due to the missing newline at the end of the output from the extract_filenames
function.
Here's is the odm file:
<?xml version="1.0"?>
<OverDriveMedia id="1c96507d-f48d-4fef-b190-130ce15c8fa8-425" ODMVersion="3.0.0.0" OMCVersion="3.0.0.0">
<License>
<AcquisitionUrl>https://ofs.contentreserve.com/MP3LicenseAcquisitionService.svc/a9368c50-6b29-44d3-bed2-ab8900003648</AcquisitionUrl>
</License><![CDATA[<Metadata>
<ContentType>MP3 Audio Book</ContentType>
<Title>The Hitchhiker's Guide to the Galaxy</Title>
<SubTitle>The Hitchhiker's Guide to the Galaxy Series, Book 1</SubTitle>
<SortTitle>Hitchhikers Guide to the Galaxy The Hitchhikers Guide to the Galaxy Series Book 01</SortTitle>
<Publisher>Penguin Random House Audio Publishing Group</Publisher>
<Series>The Hitchhiker's Guide to the Galaxy</Series>
<ThumbnailUrl>https://images.contentreserve.com/ImageType-200/1191-1/{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Img200.jpg</ThumbnailUrl>
<CoverUrl>https://images.contentreserve.com/ImageType-100/1191-1/{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Img100.jpg</CoverUrl>
<Creators>
<Creator role="Author" file-as="Adams, Douglas">Douglas Adams</Creator>
<Creator role="Narrator" file-as="Fry, Stephen">Stephen Fry</Creator>
</Creators>
<Subjects>
<Subject id="26">Fiction</Subject>
<Subject id="80">Science Fiction</Subject>
<Subject id="98">Science Fiction & Fantasy</Subject>
</Subjects>
<Languages>
<Language code="en">English</Language>
</Languages>
<Description><b><i>NEW YORK TIMES </i>BESTSELLER • "Extremely funny . . . inspired lunacy . . . [and] over much too soon."—<i>The Washington Post Book World</i></b><br><b>Nominated as one of America's best-loved novels by PBS's <i>The Great American Read</i></b><br>Seconds before Earth is demolished to make way for a galactic freeway, Arthur Dent is plucked off the planet by his friend Ford Prefect, a researcher for the revised edition of <i>The Hitchhiker's Guide to the Galaxy </i>who, for the last fifteen years, has been posing as an out-of-work actor.<br>Together, this dynamic pair began a journey through space aided by a galaxyful of fellow travelers: Zaphod Beeblebrox—the two-headed, three-armed ex-hippie and totally out-to-lunch president of the galaxy; Trillian (formerly Tricia McMillan), Zaphod's girlfriend, whom Arthur tried to pick up at a cocktail party once upon a time zone; Marvin, a paranoid, brilliant, and chronically depressed robot; and Veet Voojagig, a former...</Description>
</Metadata>]]><DrmInfo>
<PlayOnPC>1</PlayOnPC>
<PlayOnPCCount>-1</PlayOnPCCount>
<BurnToCD>1</BurnToCD>
<BurnToCDCount>-1</BurnToCDCount>
<PlayOnPM>1</PlayOnPM>
<TransferToSDMI>1</TransferToSDMI>
<TransferToNonSDMI>1</TransferToNonSDMI>
<TransferCount>-1</TransferCount>
<CollaborativePlay>0</CollaborativePlay>
<PublicPerformance>0</PublicPerformance>
<TranscodeToAAC>1</TranscodeToAAC>
<ExpirationDate>2020-04-15T04:00:37Z</ExpirationDate><Hash>psH5vw10Ee2JHYeevPcyybsMdGk=</Hash><Hash2>7N3xlqEPZMb2WghFVIRuRBkn1Ak=</Hash2></DrmInfo><Formats><Format name="Medium Quality"><Quality level="Medium" /><Protocols><Protocol method="download" baseurl="https://mp3audio-gk.cdn.overdrive.com/MP3AudioStore1" /></Protocols><Parts count="5"><Part number="1" filesize="36500043" name="Part 1" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part01.mp3" duration="75:54" /><Part number="2" filesize="34837193" name="Part 2" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part02.mp3" duration="72:26" /><Part number="3" filesize="33988318" name="Part 3" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part03.mp3" duration="70:40" /><Part number="4" filesize="31996742" name="Part 4" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part04.mp3" duration="66:31" /><Part number="5" filesize="31771462" name="Part 5" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part05.mp3" duration="66:03" /></Parts></Format></Formats><Source id="SanJose"><Name>San José Digital Library</Name><WebsiteUrl>http://overdrive.sjlibrary.org</WebsiteUrl><BannerUrl>http://overdrive.sjlibrary.org/ODMBanner.gif</BannerUrl><AccentColor>#eeeeee</AccentColor></Source><TransactionID>022-1693196-00045</TransactionID><EarlyReturnURL>https://notifications-ofs.contentreserve.com/EarlyReturn/SanJose/022-1693196-00045/1c96507d-f48d-4fef-b190-130ce15c8fa8-425?h=3sMV0nvFR5zf%2fqEj9DvuFbXnRbf9Lbz2vdouyzTJ5t8%3d</EarlyReturnURL><DownloadSuccessURL>https://notifications-ofs.contentreserve.com/DownloadSuccess/SanJose/022-1693196-00045/1c96507d-f48d-4fef-b190-130ce15c8fa8-425?h=3sMV0nvFR5zf%2fqEj9DvuFbXnRbf9Lbz2vdouyzTJ5t8%3d</DownloadSuccessURL></OverDriveMedia>```
I'm guessing this is some cross-platform inconsistency, and that your sed
isn't adding a newline at EOF (in the extract_filenames
function). E.g., on my macOS Mojave where sed
= /usr/bin/sed
and gsed
comes from Homebrew's gnu-sed
package, I observe the following difference:
$ printf 'L1\nL2' | od -c
0000000 L 1 \n L 2
0000005
$ printf 'L1\nL2' | sed 's/a/z/' | od -c # no-op substitution via sed
0000000 L 1 \n L 2 \n
0000006
$ printf 'L1\nL2' | gsed 's/a/z/' | od -c # no-op substitution via gsed
0000000 L 1 \n L 2
0000005
Which is surprising. And obviously it's not ideal for that particular POSIX vs. GNU vagary to break my overdrive.sh
bash script :(
I don't recall why I wrote _xmllint_iter_xpath
to omit the trailing newline, but there must have been a reason, because the alternative is much easier to implement. I'll see what I can do to factor out the divergence of behavior due to which sed
gets called and update here when I arrive at a decent solution.
Fixed by #16