quodlibet / mutagen

Python module for handling audio metadata

Home Page:https://mutagen.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IOError (not enough bytes) in read_full() on WAV file with extended ID3 header

dolkow opened this issue · comments

Mutagen fails to open the attached silence.wav.gz (but gunzip'd, of course; github wouldn't allow the plain .wav to be uploaded). With just a tiny log print added, we see we're 12 bytes short:

mutagen$ git diff
diff --git a/mutagen/_util.py b/mutagen/_util.py
index b99c7c7..ff14537 100644
--- a/mutagen/_util.py
+++ b/mutagen/_util.py
@@ -654,6 +654,7 @@ def read_full(fileobj, size: int) -> None:
 
     data = fileobj.read(size)
     if len(data) != size:
+        print('tried to read %d bytes, but got %d' % (size, len(data)), file=sys.stderr)
         raise IOError
     return data

mutagen$ python3
Python 3.11.5 (main, Aug 31 2023, 07:57:41) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mutagen
>>> mutagen
<module 'mutagen' from '/tmp/mutagen/mutagen/__init__.py'>
>>> mutagen.File('/tmp/silence.wav')
tried to read 184 bytes, but got 172
Traceback (most recent call last):
  File "/tmp/mutagen/mutagen/_util.py", line 185, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/_util.py", line 156, in wrapper
    return func(self, h, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/id3/_file.py", line 169, in load
    data = read_full(fileobj, self.size - 10)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/_util.py", line 658, in read_full
    raise IOError
OSError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/mutagen/mutagen/wave.py", line 200, in load
    self.tags = _WaveID3(fileobj, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/id3/_file.py", line 76, in __init__
    super(ID3, self).__init__(*args, **kwargs)
  File "/tmp/mutagen/mutagen/id3/_tags.py", line 175, in __init__
    super(ID3Tags, self).__init__(*args, **kwargs)
  File "/tmp/mutagen/mutagen/_util.py", line 534, in __init__
    super(DictProxy, self).__init__(*args, **kwargs)
  File "/tmp/mutagen/mutagen/_tags.py", line 110, in __init__
    self.load(*args, **kwargs)
  File "/tmp/mutagen/mutagen/_util.py", line 189, in wrapper
    reraise(exc_dest, err, sys.exc_info()[2])
  File "/tmp/mutagen/mutagen/_util.py", line 43, in reraise
    raise tp(value).with_traceback(tb)
  File "/tmp/mutagen/mutagen/_util.py", line 185, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/_util.py", line 156, in wrapper
    return func(self, h, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/id3/_file.py", line 169, in load
    data = read_full(fileobj, self.size - 10)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/_util.py", line 658, in read_full
    raise IOError
mutagen.id3._util.error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/mutagen/mutagen/_util.py", line 164, in wrapper_func
    return func(h, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/_file.py", line 302, in File
    return Kind(fileobj, filename=filething.filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/_file.py", line 48, in __init__
    self.load(*args, **kwargs)
  File "/tmp/mutagen/mutagen/_util.py", line 185, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/_util.py", line 156, in wrapper
    return func(self, h, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/mutagen/mutagen/wave.py", line 204, in load
    raise error(e)
mutagen.wave.error

>>> mutagen.version_string
'1.47.1'

My distro's 1.46.0 release has the same behavior.

The file and tags were generated by export from Audacity. ffprobe handles it (or at least recognizes that there's an id3 tag -- but ignores it for other metadata?):

tmp$ ffprobe silence.wav
ffprobe version 4.4.4 Copyright (c) 2007-2023 the FFmpeg developers
  built with gcc 13 (SUSE Linux)
  configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --incdir=/usr/include/ffmpeg --extra-cflags='-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -ffat-lto-objects -g' --optflags='-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -ffat-lto-objects -g' --disable-htmlpages --enable-pic --disable-stripping --enable-shared --disable-static --enable-gpl --enable-version3 --disable-openssl --enable-avresample --enable-gnutls --enable-ladspa --enable-vulkan --enable-libglslang --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcelt --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libdc1394 --enable-libdrm --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librav1e --enable-librubberband --enable-libsvtav1 --enable-libsoxr --enable-libspeex --enable-libssh --enable-libsrt --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libv4l2 --enable-libvpx --enable-libwebp --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lto --enable-lv2 --enable-libmfx --enable-vaapi --enable-vdpau --enable-version3 --enable-libfdk-aac-dlopen --enable-nonfree --enable-libvo-amrwbenc --enable-libx264 --enable-libx265 --enable-librtmp --enable-libxvid
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
[wav @ 0x55ffa9bf3d00] Discarding ID3 tags because more suitable tags were found.
Input #0, wav, from 'silence.wav':
  Metadata:
    title           : One Second of Silence
    album           : Mutagen Bug Reports
    artist          : Snild Dolkow
    comment         : This is a comment!
    date            : 2023
    genre           : Relaxation..? :)
    track           : 1
  Duration: 00:00:01.00, bitrate: 708 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels, s16, 705 kb/s
tmp$ hexdump -C 'silence.wav'
00000000  52 49 46 46 20 5a 01 00  57 41 56 45 66 6d 74 20  |RIFF Z..WAVEfmt |
00000010  10 00 00 00 01 00 01 00  44 ac 00 00 88 58 01 00  |........D....X..|
00000020  02 00 10 00 64 61 74 61  88 58 01 00 00 00 00 00  |....data.X......|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000158b0  00 00 00 00 4c 49 53 54  a2 00 00 00 49 4e 46 4f  |....LIST....INFO|
000158c0  49 4e 41 4d 16 00 00 00  4f 6e 65 20 53 65 63 6f  |INAM....One Seco|
000158d0  6e 64 20 6f 66 20 53 69  6c 65 6e 63 65 00 49 50  |nd of Silence.IP|
000158e0  52 44 14 00 00 00 4d 75  74 61 67 65 6e 20 42 75  |RD....Mutagen Bu|
000158f0  67 20 52 65 70 6f 72 74  73 00 49 41 52 54 0e 00  |g Reports.IART..|
00015900  00 00 53 6e 69 6c 64 20  44 6f 6c 6b 6f 77 00 00  |..Snild Dolkow..|
00015910  49 43 4d 54 14 00 00 00  54 68 69 73 20 69 73 20  |ICMT....This is |
00015920  61 20 63 6f 6d 6d 65 6e  74 21 00 00 49 43 52 44  |a comment!..ICRD|
00015930  06 00 00 00 32 30 32 33  00 00 49 47 4e 52 12 00  |....2023..IGNR..|
00015940  00 00 52 65 6c 61 78 61  74 69 6f 6e 2e 2e 3f 20  |..Relaxation..? |
00015950  3a 29 00 00 49 54 52 4b  02 00 00 00 31 00 69 64  |:)..ITRK....1.id|
00015960  33 20 c2 00 00 00 49 44  33 04 00 40 00 00 01 38  |3 ....ID3..@...8|
00015970  00 00 00 0c 01 20 05 0f  47 0f 54 14 43 4f 4d 4d  |..... ..G.T.COMM|
00015980  00 00 00 17 00 00 00 00  00 00 00 54 68 69 73 20  |...........This |
00015990  69 73 20 61 20 63 6f 6d  6d 65 6e 74 21 54 43 4f  |is a comment!TCO|
000159a0  4e 00 00 00 11 00 00 00  52 65 6c 61 78 61 74 69  |N.......Relaxati|
000159b0  6f 6e 2e 2e 3f 20 3a 29  54 44 52 43 00 00 00 05  |on..? :)TDRC....|
000159c0  00 00 00 32 30 32 33 54  52 43 4b 00 00 00 02 00  |...2023TRCK.....|
000159d0  00 00 31 54 41 4c 42 00  00 00 14 00 00 00 4d 75  |..1TALB.......Mu|
000159e0  74 61 67 65 6e 20 42 75  67 20 52 65 70 6f 72 74  |tagen Bug Report|
000159f0  73 54 49 54 32 00 00 00  16 00 00 00 4f 6e 65 20  |sTIT2.......One |
00015a00  53 65 63 6f 6e 64 20 6f  66 20 53 69 6c 65 6e 63  |Second of Silenc|
00015a10  65 54 50 45 31 00 00 00  0d 00 00 00 53 6e 69 6c  |eTPE1.......Snil|
00015a20  64 20 44 6f 6c 6b 6f 77                           |d Dolkow|
00015a28

Based on my (very newly-acquired) understanding of id3v2.4, the header says:

  • version 04 00
  • flags byte 0x40, meaning only the "extended header" bit is set
  • size is 00 00 01 38 (synchsafe), translating to 0xb8 = 184

Then comes the extended header:

  • size is 00 00 00 0c = 0xc = 12 -- a very interesting number!

I suspect that the extsize_data = read_full(fileobj, 4) and self._extdata = read_full(fileobj, extsize) lines in ID3Header.__init__() are the culprits -- they have already eaten those 12 bytes (and they have not been subtracted from the total size value of the header).

Thanks for the detailed report. Yes, indeed. Not considering the extended header size seems to be the issue. Looks like ID3 tags with extended header are extremely rare, otherwise we would have seen more reports.

I can provide a patch later today

Looks like ID3 tags with extended header are extremely rare, otherwise we would have seen more reports.

Or it's maybe just that ID3 tags at the end of the file are rare? Audacity writes it at the beginning when exporting mp3:

tmp$ hexdump -C silence.mp3 
00000000  49 44 33 04 00 40 00 00  01 68 00 00 00 0c 01 20  |ID3..@...h..... |
00000010  05 06 03 1a 03 7f 54 52  43 4b 00 00 00 02 00 00  |......TRCK......|
00000020  00 31 54 43 4f 4e 00 00  00 11 00 00 00 52 65 6c  |.1TCON.......Rel|
00000030  61 78 61 74 69 6f 6e 2e  2e 3f 20 3a 29 54 59 45  |axation..? :)TYE|
00000040  52 00 00 00 05 00 00 00  32 30 32 33 54 44 52 43  |R.......2023TDRC|
00000050  00 00 00 05 00 00 00 32  30 32 33 43 4f 4d 4d 00  |.......2023COMM.|
00000060  00 00 17 00 00 00 00 00  00 00 54 68 69 73 20 69  |..........This i|
00000070  73 20 61 20 63 6f 6d 6d  65 6e 74 21 43 4f 4d 4d  |s a comment!COMM|
00000080  00 00 00 17 00 00 00 58  58 58 00 54 68 69 73 20  |.......XXX.This |
00000090  69 73 20 61 20 63 6f 6d  6d 65 6e 74 21 54 50 45  |is a comment!TPE|
000000a0  31 00 00 00 0d 00 00 00  53 6e 69 6c 64 20 44 6f  |1.......Snild Do|
000000b0  6c 6b 6f 77 54 41 4c 42  00 00 00 14 00 00 00 4d  |lkowTALB.......M|
000000c0  75 74 61 67 65 6e 20 42  75 67 20 52 65 70 6f 72  |utagen Bug Repor|
000000d0  74 73 54 49 54 32 00 00  00 16 00 00 00 4f 6e 65  |tsTIT2.......One|
000000e0  20 53 65 63 6f 6e 64 20  6f 66 20 53 69 6c 65 6e  | Second of Silen|
000000f0  63 65 ff fb 90 c4 00 00  00 00 00 00 00 00 00 00  |ce..............|
00000100  00 00 00 00 00 00 00 58  69 6e 67 00 00 00 0f 00  |.......Xing.....|
00000110  00 00 28 00 00 11 e1 00  06 06 0c 0c 13 13 13 19  |..(.............|
00000120  19 20 20 20 26 26 2c 2c  2c 33 33 39 39 39 40 40  |.   &&,,,33999@@|
00000130  46 46 46 4c 4c 53 53 53  59 59 60 60 60 66 66 6c  |FFFLLSSSYY```ffl|

In that case, the overly-long read will not be noticed.

I wonder what'd happen if you tried to save the modified headers, though. Maybe writes aren't based on that same size variable so it's fine?

As I understand it, adding the ID3 tag to the start of the file is not possible in WAVs. It's probably also uncommon to add ID3 tags to WAVs, which is why I seem to be the first to have stumbled upon this. :)

I can provide a patch later today

To be clear, I'm not in that much of a hurry; whatever time/day that's convenient for you is more than fine.

Or it's maybe just that ID3 tags at the end of the file are rare? Audacity writes it at the beginning when exporting mp3:

Yes, looks like this is what happens. Also ID3 tags in WAVE are non-standard and only supported by a few tools (e.g. MP3Tag, foobar2000 and a few more). The tags don't necessarily are at the end, but the file often ends up like that.

But still I think those extended headers are rare. Which tool did you use to tag this WAVE file with ID3?

I wonder what'd happen if you tried to save the modified headers, though. Maybe writes aren't based on that same size variable so it's fine?

It "works" in the sense that it generates a valid file with proper ID3 tag block, as the size gets recalculated. But actually mutagen does not support extended header and does not write it. So when saving it gets lost. In this particular example the extended header contained the CRC checksum, which the newly written tag will not have.

Extending mutagen to support the extended ID3 header would be a separate story. At least it could be considered preserving existing headers. But even then it must be considered how each flag is handled (and maybe not all flags are supported). E.g. the CRC needs to be recalculated of course. Not sure how to deal with the tag size restriction flags then, probably drop them.

To be clear, I'm not in that much of a hurry; whatever time/day that's convenient for you is more than fine.

Ha, no. All good. It was just that I was investigating this and I had tests and fix already ready, but then had no time to finish. I just wanted to comment so nobody else wasted time doing the same.

Which tool did you use to tag this WAVE file with ID3?

Audacity 3.3.3 -- just the "Export as WAV" option in the menu, which pops up a metadata dialog after choosing the output location. To be very specific, this is what the Build Information tab in About says:

The Build
Commit Id:
Official openSUSE BuildSTRING:3.3.3|STRING]] of 2023-07-12T00:00:00Z
Build type:
CMake Release build (debug level 1), 64 bits
Compiler:
GCC 13.2.1
Installation Prefix:
/usr
Cache folder:
/home/snild/.cache/audacity
Settings folder:
/home/snild/.config/audacity
Data folder:
/home/snild/.local/share/audacity
State folder:
/home/snild/.local/state/audacity
Core Libraries
wxWidgets
(Cross-platform GUI library)
3.2.2
PortAudio
(Audio playback and recording)
v19
libsoxr
(Sample rate conversion)
Enabled
File Format Support
libmpg123
(MP3 Importing)
Enabled
libvorbis
(Ogg Vorbis Import and Export)
Enabled
libid3tag
(ID3 tag support)
Enabled
libflac
(FLAC import and export)
Enabled
libtwolame
(MP2 export)
Enabled
QuickTime
(Import via QuickTime)
Disabled
ffmpeg
(FFmpeg Import/Export)
Enabled
gstreamer
(Import via GStreamer)
Disabled
Features
Nyquist
(Plug-in support)
Enabled
LADSPA
(Plug-in support)
Enabled
Vamp
(Plug-in support)
Enabled
Audio Units
(Plug-in support)
Disabled
VST
(Plug-in support)
Enabled
LV2
(Plug-in support)
Enabled
PortMixer
(Sound card mixer support)
Enabled
SoundTouch
(Pitch and Tempo Change support)
Enabled
SBSMS
(Extreme Pitch and Tempo Change support)
Enabled

So... libid3tag maybe?

mutagen does not support extended header and does not write it

That's fine by me. I just want to be able to read the "normal" tags out of WAV files.

even then it must be considered how each flag is handled (and maybe not all flags are supported)

https://mutagen-specs.readthedocs.io/en/latest/id3/id3v2.4.0-structure.html says "All unknown flags MUST be unset and their corresponding data removed when a tag is modified", which seems like a reasonable strategy (and the only one that could work for e.g. checksums).