ruby / rss

RSS reading and writing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Podcasts - Many <duration> values are not accepted

jarvisjohnson opened this issue · comments

Running a script through various podcasts' feeds brings up the RSS::NotAvailableValueError for quite a few feeds, such as value <2238> of tag <duration> is not available due to validating the time format at https://github.com/ruby/rss/blob/master/lib/rss/itunes.rb#L283.

Podcasts with duration values like 22:38 are accepted, which makes sense.

But it seems lots of the feeds (for podcasts that are listed in iTunes) have values like 2238.

Ex: https://feeds.megaphone.fm/HSW2732644812

I wonder if a time of 22:38 should be assumed from something like above, as it seems a few RSS creators are outputting times this way? Thanks!

Validation in rss is based on spec not real use cases.
For example, this is based on https://github.com/simplepie/simplepie-ng/wiki/Spec:-iTunes-Podcast-RSS#itunesduration .

BTW, what does 2238 mean? 2238 seconds? 22:38?

If you specify a single number as a value (without colons), the iTunes Store displays the value as seconds.

2238 seconds?

Apologies I missed your replies.

Ok, so in that case then an integer / number without colon should get converted to minutes:seconds I would think, rather than error out.

Yes, in this case that would be 37:38.

I can submit a PR if you like?

Do you mean that RSS::Parser should accept 2238 as a valid value? It's not acceptable because it's not a valid format.

You can parse 2238 with RSS::Parser.parse(..., validate: false). In this case, you get 2238 by rss.items.first.itunes_duration.second.

If you mean that itunes_duration.minute is 37 and itunes_duration.second is 38 for 2238, it's acceptable.

Yeah - I think 2238 should be accepted as valid - any number without colons should be treated as seconds. Your quote from the spec above implies that it is valid.

My mistake above - it's actually 37.3 minutes, or 37:18 in this case. Whether the parsing returns 2238 seconds or 37:18, it just shouldn't fail based on this as the spec implies a whole number is a valid duration.

My point is the a large amount of the podcast feeds I tried to parse failed because of this error, and I don't think it should be an error as it appears to be valid in the spec.

Umm. OK. We'll accept seconds part only case.

The spec says the followings are valid formats:

  • HH:MM:SS
  • H:MM:SS
  • MM:SS
  • M:SS

But the spec describes seconds part only case in the following part:

If you specify a single number as a value (without colons), the iTunes Store displays the value as seconds.

Second part only case isn't included in the above valid format list but second part only case will be valid.

@kou @jarvisjohnson Just got bitten by this yesterday. Checkout the PR and let me know if there is something you think may be improved.

BTW the official spec (https://help.apple.com/itc/podcasts_connect/#/itcb54353390) now mentions this too:

If you specify more then two colons, Apple Podcasts ignores the numbers farthest to the right.

So following the same idea, values like 01:34:45:26 or even 01:34:45:26.423 should be valid and treated as 01:34:45. Since I haven't found these cases in the wild, I haven't prepared a PR for them but let me know if you want to include them to stay true to the spec.

Nice work, thanks for the PR @aitor!

@jarvisjohnson you're welcome 🙇

Done by #5.