chbrown / overdrive

Bash script to download mp3s from the OverDrive audiobook service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing last chapter

amjith opened this issue · comments

I borrowed The Hitchhiker's Guide to the Galaxy and downloaded the mp3 files. There are 5 chapters in the odm file (copied below) but the script only downloaded 4 files. I suspect it is due to the missing newline at the end of the output from the extract_filenames function.

Here's is the odm file:

<?xml version="1.0"?>
<OverDriveMedia id="1c96507d-f48d-4fef-b190-130ce15c8fa8-425" ODMVersion="3.0.0.0" OMCVersion="3.0.0.0">
	<License>
		<AcquisitionUrl>https://ofs.contentreserve.com/MP3LicenseAcquisitionService.svc/a9368c50-6b29-44d3-bed2-ab8900003648</AcquisitionUrl>
	</License><![CDATA[<Metadata>
	<ContentType>MP3 Audio Book</ContentType>
	<Title>The Hitchhiker's Guide to the Galaxy</Title>
	<SubTitle>The Hitchhiker's Guide to the Galaxy Series, Book 1</SubTitle>
	<SortTitle>Hitchhikers Guide to the Galaxy The Hitchhikers Guide to the Galaxy Series Book 01</SortTitle>
	<Publisher>Penguin Random House Audio Publishing Group</Publisher>
	<Series>The Hitchhiker's Guide to the Galaxy</Series>
	<ThumbnailUrl>https://images.contentreserve.com/ImageType-200/1191-1/{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Img200.jpg</ThumbnailUrl>
	<CoverUrl>https://images.contentreserve.com/ImageType-100/1191-1/{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Img100.jpg</CoverUrl>
	<Creators>
		<Creator role="Author" file-as="Adams, Douglas">Douglas Adams</Creator>
		<Creator role="Narrator" file-as="Fry, Stephen">Stephen Fry</Creator>
	</Creators>
	<Subjects>
		<Subject id="26">Fiction</Subject>
		<Subject id="80">Science Fiction</Subject>
		<Subject id="98">Science Fiction &amp; Fantasy</Subject>
	</Subjects>
	<Languages>
		<Language code="en">English</Language>
	</Languages>
<Description>&lt;b&gt;&lt;i&gt;NEW YORK TIMES &lt;/i&gt;BESTSELLER &#8226; "Extremely funny . . . inspired lunacy . . . [and] over much too soon."&#8212;&lt;i&gt;The Washington Post Book World&lt;/i&gt;&lt;/b&gt;&lt;br&gt;&lt;b&gt;Nominated as one of America's best-loved novels by PBS's &lt;i&gt;The Great American Read&lt;/i&gt;&lt;/b&gt;&lt;br&gt;Seconds before Earth is demolished to make way for a galactic freeway, Arthur Dent is plucked off the planet by his friend Ford Prefect, a researcher for the revised edition of &lt;i&gt;The Hitchhiker's Guide to the Galaxy &lt;/i&gt;who, for the last fifteen years, has been posing as an out-of-work actor.&lt;br&gt;Together, this dynamic pair began a journey through space aided by a galaxyful of fellow travelers: Zaphod Beeblebrox&#8212;the two-headed, three-armed ex-hippie and totally out-to-lunch president of the galaxy; Trillian (formerly Tricia McMillan), Zaphod's girlfriend, whom Arthur tried to pick up at a cocktail party once upon a time zone; Marvin, a paranoid, brilliant, and chronically depressed robot; and Veet Voojagig, a former...</Description>
</Metadata>]]><DrmInfo>
	<PlayOnPC>1</PlayOnPC>
	<PlayOnPCCount>-1</PlayOnPCCount>
	<BurnToCD>1</BurnToCD>
	<BurnToCDCount>-1</BurnToCDCount> 
	<PlayOnPM>1</PlayOnPM>
	<TransferToSDMI>1</TransferToSDMI> 
	<TransferToNonSDMI>1</TransferToNonSDMI> 
	<TransferCount>-1</TransferCount>
	<CollaborativePlay>0</CollaborativePlay>
	<PublicPerformance>0</PublicPerformance>
	<TranscodeToAAC>1</TranscodeToAAC>
<ExpirationDate>2020-04-15T04:00:37Z</ExpirationDate><Hash>psH5vw10Ee2JHYeevPcyybsMdGk=</Hash><Hash2>7N3xlqEPZMb2WghFVIRuRBkn1Ak=</Hash2></DrmInfo><Formats><Format name="Medium Quality"><Quality level="Medium" /><Protocols><Protocol method="download" baseurl="https://mp3audio-gk.cdn.overdrive.com/MP3AudioStore1" /></Protocols><Parts count="5"><Part number="1" filesize="36500043" name="Part 1" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part01.mp3" duration="75:54" /><Part number="2" filesize="34837193" name="Part 2" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part02.mp3" duration="72:26" /><Part number="3" filesize="33988318" name="Part 3" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part03.mp3" duration="70:40" /><Part number="4" filesize="31996742" name="Part 4" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part04.mp3" duration="66:31" /><Part number="5" filesize="31771462" name="Part 5" filename="1191-1\1C9\650\7D\{1C96507D-F48D-4FEF-B190-130CE15C8FA8}Fmt425-Part05.mp3" duration="66:03" /></Parts></Format></Formats><Source id="SanJose"><Name>San José Digital Library</Name><WebsiteUrl>http://overdrive.sjlibrary.org</WebsiteUrl><BannerUrl>http://overdrive.sjlibrary.org/ODMBanner.gif</BannerUrl><AccentColor>#eeeeee</AccentColor></Source><TransactionID>022-1693196-00045</TransactionID><EarlyReturnURL>https://notifications-ofs.contentreserve.com/EarlyReturn/SanJose/022-1693196-00045/1c96507d-f48d-4fef-b190-130ce15c8fa8-425?h=3sMV0nvFR5zf%2fqEj9DvuFbXnRbf9Lbz2vdouyzTJ5t8%3d</EarlyReturnURL><DownloadSuccessURL>https://notifications-ofs.contentreserve.com/DownloadSuccess/SanJose/022-1693196-00045/1c96507d-f48d-4fef-b190-130ce15c8fa8-425?h=3sMV0nvFR5zf%2fqEj9DvuFbXnRbf9Lbz2vdouyzTJ5t8%3d</DownloadSuccessURL></OverDriveMedia>```

I'm guessing this is some cross-platform inconsistency, and that your sed isn't adding a newline at EOF (in the extract_filenames function). E.g., on my macOS Mojave where sed = /usr/bin/sed and gsed comes from Homebrew's gnu-sed package, I observe the following difference:

$ printf 'L1\nL2' | od -c
0000000    L   1  \n   L   2
0000005
$ printf 'L1\nL2' | sed 's/a/z/' | od -c  # no-op substitution via sed
0000000    L   1  \n   L   2  \n
0000006
$ printf 'L1\nL2' | gsed 's/a/z/' | od -c  # no-op substitution via gsed
0000000    L   1  \n   L   2
0000005

Which is surprising. And obviously it's not ideal for that particular POSIX vs. GNU vagary to break my overdrive.sh bash script :(

I don't recall why I wrote _xmllint_iter_xpath to omit the trailing newline, but there must have been a reason, because the alternative is much easier to implement. I'll see what I can do to factor out the divergence of behavior due to which sed gets called and update here when I arrive at a decent solution.