quodlibet / mutagen

Python module for handling audio metadata

Home Page:https://mutagen.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Most Lightweight reading of audio length from url

banagale opened this issue · comments

Thank you for the terrific package.

I need the length of an mp3 available at a url, and am trying to retrieve this with the lowest overhead possible using mutagen.

I referenced this snippet and came to:

from mutagen import mp3
import requests
import io

url = 'https://example.com/some_valid_mp3.mp3'
r = requests.get(url)
filelike_obj = io.BytesIO(r.content)
file = mp3.MP3(filelike_obj)
audio_length = file.info.length

Is this the "cheapest" way to derive this information using mutagen?

Is there a way to get the length information without requesting the complete audio file?

Tricky... mutagen kinda expects random access on all input files. In case of MP3 you probably can get away with reading only a small part, so you could stream it in chunks and repeatedly try parsing the file until there is enough data and it succeeded. There is one case though where if all parsing fails it will fall back to estimate the length based on the file size, which in that case is of course wrong... so this might give wrong results in those edge cases..

Something like https://github.com/barneygale/httpio might work.. though that seems unmaintained.

Another thing is https://mutagen.readthedocs.io/en/latest/user/filelike.html#gio-example-implementation, but that requires pygobject and glib.

Hey Christoph, thanks for that feedback.

It sounds like the sequence I described above is the most direct way to get audio length if the entire file is downloaded.

If I want to do it using the lowest amount of bandwidth possible:

  • Using the standard library only
    I'd want to do this chunk-based approach and perhaps discarding results that match the file size fallback length.

  • Allowing for pygobject and glib
    Use PYGObject and Gio incantations (whatever those may be)

I'm doing this to check the length of TTS synthesized audio, which is surprisingly not returned by the synthesis engine. I need this in order to properly decrement the user's quota.

I think for now, I'll just deal with the extra bandwidth as TTS synthesis is much more costly and important to control for than the bandwidth at my scale and file access permissions.

I'll revisit the two options above if my quota system still depends on synthesis artifact length and need to try and minimize resource usage to attain it.

Thanks again for the package and for your feedback, let me know if you have any further thoughts--otherwise I'll close this issue in the next few days.

One more item: Would you be interested in a docs PR for the Getting Started section demonstrating use of mutagen to read from an mp3 at the end of a URI? Happy to craft an issue / offer that if you think it is a common enough use case or otherwise helpful to illustrate potential use cases.

Closing this for now, thanks again.