IOError (not enough bytes) in read_full() on WAV file with extended ID3 header
dolkow opened this issue · comments
Mutagen fails to open the attached silence.wav.gz (but gunzip'd, of course; github wouldn't allow the plain .wav to be uploaded). With just a tiny log print added, we see we're 12 bytes short:
mutagen$ git diff
diff --git a/mutagen/_util.py b/mutagen/_util.py
index b99c7c7..ff14537 100644
--- a/mutagen/_util.py
+++ b/mutagen/_util.py
@@ -654,6 +654,7 @@ def read_full(fileobj, size: int) -> None:
data = fileobj.read(size)
if len(data) != size:
+ print('tried to read %d bytes, but got %d' % (size, len(data)), file=sys.stderr)
raise IOError
return data
mutagen$ python3
Python 3.11.5 (main, Aug 31 2023, 07:57:41) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mutagen
>>> mutagen
<module 'mutagen' from '/tmp/mutagen/mutagen/__init__.py'>
>>> mutagen.File('/tmp/silence.wav')
tried to read 184 bytes, but got 172
Traceback (most recent call last):
File "/tmp/mutagen/mutagen/_util.py", line 185, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/_util.py", line 156, in wrapper
return func(self, h, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/id3/_file.py", line 169, in load
data = read_full(fileobj, self.size - 10)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/_util.py", line 658, in read_full
raise IOError
OSError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/mutagen/mutagen/wave.py", line 200, in load
self.tags = _WaveID3(fileobj, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/id3/_file.py", line 76, in __init__
super(ID3, self).__init__(*args, **kwargs)
File "/tmp/mutagen/mutagen/id3/_tags.py", line 175, in __init__
super(ID3Tags, self).__init__(*args, **kwargs)
File "/tmp/mutagen/mutagen/_util.py", line 534, in __init__
super(DictProxy, self).__init__(*args, **kwargs)
File "/tmp/mutagen/mutagen/_tags.py", line 110, in __init__
self.load(*args, **kwargs)
File "/tmp/mutagen/mutagen/_util.py", line 189, in wrapper
reraise(exc_dest, err, sys.exc_info()[2])
File "/tmp/mutagen/mutagen/_util.py", line 43, in reraise
raise tp(value).with_traceback(tb)
File "/tmp/mutagen/mutagen/_util.py", line 185, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/_util.py", line 156, in wrapper
return func(self, h, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/id3/_file.py", line 169, in load
data = read_full(fileobj, self.size - 10)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/_util.py", line 658, in read_full
raise IOError
mutagen.id3._util.error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/mutagen/mutagen/_util.py", line 164, in wrapper_func
return func(h, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/_file.py", line 302, in File
return Kind(fileobj, filename=filething.filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/_file.py", line 48, in __init__
self.load(*args, **kwargs)
File "/tmp/mutagen/mutagen/_util.py", line 185, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/_util.py", line 156, in wrapper
return func(self, h, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/mutagen/mutagen/wave.py", line 204, in load
raise error(e)
mutagen.wave.error
>>> mutagen.version_string
'1.47.1'
My distro's 1.46.0 release has the same behavior.
The file and tags were generated by export from Audacity. ffprobe
handles it (or at least recognizes that there's an id3 tag -- but ignores it for other metadata?):
tmp$ ffprobe silence.wav
ffprobe version 4.4.4 Copyright (c) 2007-2023 the FFmpeg developers
built with gcc 13 (SUSE Linux)
configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --incdir=/usr/include/ffmpeg --extra-cflags='-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -ffat-lto-objects -g' --optflags='-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -ffat-lto-objects -g' --disable-htmlpages --enable-pic --disable-stripping --enable-shared --disable-static --enable-gpl --enable-version3 --disable-openssl --enable-avresample --enable-gnutls --enable-ladspa --enable-vulkan --enable-libglslang --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcelt --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libdc1394 --enable-libdrm --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librav1e --enable-librubberband --enable-libsvtav1 --enable-libsoxr --enable-libspeex --enable-libssh --enable-libsrt --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libv4l2 --enable-libvpx --enable-libwebp --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lto --enable-lv2 --enable-libmfx --enable-vaapi --enable-vdpau --enable-version3 --enable-libfdk-aac-dlopen --enable-nonfree --enable-libvo-amrwbenc --enable-libx264 --enable-libx265 --enable-librtmp --enable-libxvid
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
[wav @ 0x55ffa9bf3d00] Discarding ID3 tags because more suitable tags were found.
Input #0, wav, from 'silence.wav':
Metadata:
title : One Second of Silence
album : Mutagen Bug Reports
artist : Snild Dolkow
comment : This is a comment!
date : 2023
genre : Relaxation..? :)
track : 1
Duration: 00:00:01.00, bitrate: 708 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels, s16, 705 kb/s
tmp$ hexdump -C 'silence.wav'
00000000 52 49 46 46 20 5a 01 00 57 41 56 45 66 6d 74 20 |RIFF Z..WAVEfmt |
00000010 10 00 00 00 01 00 01 00 44 ac 00 00 88 58 01 00 |........D....X..|
00000020 02 00 10 00 64 61 74 61 88 58 01 00 00 00 00 00 |....data.X......|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000158b0 00 00 00 00 4c 49 53 54 a2 00 00 00 49 4e 46 4f |....LIST....INFO|
000158c0 49 4e 41 4d 16 00 00 00 4f 6e 65 20 53 65 63 6f |INAM....One Seco|
000158d0 6e 64 20 6f 66 20 53 69 6c 65 6e 63 65 00 49 50 |nd of Silence.IP|
000158e0 52 44 14 00 00 00 4d 75 74 61 67 65 6e 20 42 75 |RD....Mutagen Bu|
000158f0 67 20 52 65 70 6f 72 74 73 00 49 41 52 54 0e 00 |g Reports.IART..|
00015900 00 00 53 6e 69 6c 64 20 44 6f 6c 6b 6f 77 00 00 |..Snild Dolkow..|
00015910 49 43 4d 54 14 00 00 00 54 68 69 73 20 69 73 20 |ICMT....This is |
00015920 61 20 63 6f 6d 6d 65 6e 74 21 00 00 49 43 52 44 |a comment!..ICRD|
00015930 06 00 00 00 32 30 32 33 00 00 49 47 4e 52 12 00 |....2023..IGNR..|
00015940 00 00 52 65 6c 61 78 61 74 69 6f 6e 2e 2e 3f 20 |..Relaxation..? |
00015950 3a 29 00 00 49 54 52 4b 02 00 00 00 31 00 69 64 |:)..ITRK....1.id|
00015960 33 20 c2 00 00 00 49 44 33 04 00 40 00 00 01 38 |3 ....ID3..@...8|
00015970 00 00 00 0c 01 20 05 0f 47 0f 54 14 43 4f 4d 4d |..... ..G.T.COMM|
00015980 00 00 00 17 00 00 00 00 00 00 00 54 68 69 73 20 |...........This |
00015990 69 73 20 61 20 63 6f 6d 6d 65 6e 74 21 54 43 4f |is a comment!TCO|
000159a0 4e 00 00 00 11 00 00 00 52 65 6c 61 78 61 74 69 |N.......Relaxati|
000159b0 6f 6e 2e 2e 3f 20 3a 29 54 44 52 43 00 00 00 05 |on..? :)TDRC....|
000159c0 00 00 00 32 30 32 33 54 52 43 4b 00 00 00 02 00 |...2023TRCK.....|
000159d0 00 00 31 54 41 4c 42 00 00 00 14 00 00 00 4d 75 |..1TALB.......Mu|
000159e0 74 61 67 65 6e 20 42 75 67 20 52 65 70 6f 72 74 |tagen Bug Report|
000159f0 73 54 49 54 32 00 00 00 16 00 00 00 4f 6e 65 20 |sTIT2.......One |
00015a00 53 65 63 6f 6e 64 20 6f 66 20 53 69 6c 65 6e 63 |Second of Silenc|
00015a10 65 54 50 45 31 00 00 00 0d 00 00 00 53 6e 69 6c |eTPE1.......Snil|
00015a20 64 20 44 6f 6c 6b 6f 77 |d Dolkow|
00015a28
Based on my (very newly-acquired) understanding of id3v2.4, the header says:
- version 04 00
- flags byte 0x40, meaning only the "extended header" bit is set
- size is 00 00 01 38 (synchsafe), translating to 0xb8 = 184
Then comes the extended header:
- size is 00 00 00 0c = 0xc = 12 -- a very interesting number!
I suspect that the extsize_data = read_full(fileobj, 4)
and self._extdata = read_full(fileobj, extsize)
lines in ID3Header.__init__()
are the culprits -- they have already eaten those 12 bytes (and they have not been subtracted from the total size
value of the header).
Thanks for the detailed report. Yes, indeed. Not considering the extended header size seems to be the issue. Looks like ID3 tags with extended header are extremely rare, otherwise we would have seen more reports.
I can provide a patch later today
Looks like ID3 tags with extended header are extremely rare, otherwise we would have seen more reports.
Or it's maybe just that ID3 tags at the end of the file are rare? Audacity writes it at the beginning when exporting mp3:
tmp$ hexdump -C silence.mp3
00000000 49 44 33 04 00 40 00 00 01 68 00 00 00 0c 01 20 |ID3..@...h..... |
00000010 05 06 03 1a 03 7f 54 52 43 4b 00 00 00 02 00 00 |......TRCK......|
00000020 00 31 54 43 4f 4e 00 00 00 11 00 00 00 52 65 6c |.1TCON.......Rel|
00000030 61 78 61 74 69 6f 6e 2e 2e 3f 20 3a 29 54 59 45 |axation..? :)TYE|
00000040 52 00 00 00 05 00 00 00 32 30 32 33 54 44 52 43 |R.......2023TDRC|
00000050 00 00 00 05 00 00 00 32 30 32 33 43 4f 4d 4d 00 |.......2023COMM.|
00000060 00 00 17 00 00 00 00 00 00 00 54 68 69 73 20 69 |..........This i|
00000070 73 20 61 20 63 6f 6d 6d 65 6e 74 21 43 4f 4d 4d |s a comment!COMM|
00000080 00 00 00 17 00 00 00 58 58 58 00 54 68 69 73 20 |.......XXX.This |
00000090 69 73 20 61 20 63 6f 6d 6d 65 6e 74 21 54 50 45 |is a comment!TPE|
000000a0 31 00 00 00 0d 00 00 00 53 6e 69 6c 64 20 44 6f |1.......Snild Do|
000000b0 6c 6b 6f 77 54 41 4c 42 00 00 00 14 00 00 00 4d |lkowTALB.......M|
000000c0 75 74 61 67 65 6e 20 42 75 67 20 52 65 70 6f 72 |utagen Bug Repor|
000000d0 74 73 54 49 54 32 00 00 00 16 00 00 00 4f 6e 65 |tsTIT2.......One|
000000e0 20 53 65 63 6f 6e 64 20 6f 66 20 53 69 6c 65 6e | Second of Silen|
000000f0 63 65 ff fb 90 c4 00 00 00 00 00 00 00 00 00 00 |ce..............|
00000100 00 00 00 00 00 00 00 58 69 6e 67 00 00 00 0f 00 |.......Xing.....|
00000110 00 00 28 00 00 11 e1 00 06 06 0c 0c 13 13 13 19 |..(.............|
00000120 19 20 20 20 26 26 2c 2c 2c 33 33 39 39 39 40 40 |. &&,,,33999@@|
00000130 46 46 46 4c 4c 53 53 53 59 59 60 60 60 66 66 6c |FFFLLSSSYY```ffl|
In that case, the overly-long read will not be noticed.
I wonder what'd happen if you tried to save the modified headers, though. Maybe writes aren't based on that same size variable so it's fine?
As I understand it, adding the ID3 tag to the start of the file is not possible in WAVs. It's probably also uncommon to add ID3 tags to WAVs, which is why I seem to be the first to have stumbled upon this. :)
I can provide a patch later today
To be clear, I'm not in that much of a hurry; whatever time/day that's convenient for you is more than fine.
Or it's maybe just that ID3 tags at the end of the file are rare? Audacity writes it at the beginning when exporting mp3:
Yes, looks like this is what happens. Also ID3 tags in WAVE are non-standard and only supported by a few tools (e.g. MP3Tag, foobar2000 and a few more). The tags don't necessarily are at the end, but the file often ends up like that.
But still I think those extended headers are rare. Which tool did you use to tag this WAVE file with ID3?
I wonder what'd happen if you tried to save the modified headers, though. Maybe writes aren't based on that same size variable so it's fine?
It "works" in the sense that it generates a valid file with proper ID3 tag block, as the size gets recalculated. But actually mutagen does not support extended header and does not write it. So when saving it gets lost. In this particular example the extended header contained the CRC checksum, which the newly written tag will not have.
Extending mutagen to support the extended ID3 header would be a separate story. At least it could be considered preserving existing headers. But even then it must be considered how each flag is handled (and maybe not all flags are supported). E.g. the CRC needs to be recalculated of course. Not sure how to deal with the tag size restriction flags then, probably drop them.
To be clear, I'm not in that much of a hurry; whatever time/day that's convenient for you is more than fine.
Ha, no. All good. It was just that I was investigating this and I had tests and fix already ready, but then had no time to finish. I just wanted to comment so nobody else wasted time doing the same.
Which tool did you use to tag this WAVE file with ID3?
Audacity 3.3.3 -- just the "Export as WAV" option in the menu, which pops up a metadata dialog after choosing the output location. To be very specific, this is what the Build Information tab in About says:
The Build
Commit Id:
Official openSUSE BuildSTRING:3.3.3|STRING]] of 2023-07-12T00:00:00Z
Build type:
CMake Release build (debug level 1), 64 bits
Compiler:
GCC 13.2.1
Installation Prefix:
/usr
Cache folder:
/home/snild/.cache/audacity
Settings folder:
/home/snild/.config/audacity
Data folder:
/home/snild/.local/share/audacity
State folder:
/home/snild/.local/state/audacity
Core Libraries
wxWidgets
(Cross-platform GUI library)
3.2.2
PortAudio
(Audio playback and recording)
v19
libsoxr
(Sample rate conversion)
Enabled
File Format Support
libmpg123
(MP3 Importing)
Enabled
libvorbis
(Ogg Vorbis Import and Export)
Enabled
libid3tag
(ID3 tag support)
Enabled
libflac
(FLAC import and export)
Enabled
libtwolame
(MP2 export)
Enabled
QuickTime
(Import via QuickTime)
Disabled
ffmpeg
(FFmpeg Import/Export)
Enabled
gstreamer
(Import via GStreamer)
Disabled
Features
Nyquist
(Plug-in support)
Enabled
LADSPA
(Plug-in support)
Enabled
Vamp
(Plug-in support)
Enabled
Audio Units
(Plug-in support)
Disabled
VST
(Plug-in support)
Enabled
LV2
(Plug-in support)
Enabled
PortMixer
(Sound card mixer support)
Enabled
SoundTouch
(Pitch and Tempo Change support)
Enabled
SBSMS
(Extreme Pitch and Tempo Change support)
Enabled
So... libid3tag maybe?
mutagen does not support extended header and does not write it
That's fine by me. I just want to be able to read the "normal" tags out of WAV files.
even then it must be considered how each flag is handled (and maybe not all flags are supported)
https://mutagen-specs.readthedocs.io/en/latest/id3/id3v2.4.0-structure.html says "All unknown flags MUST be unset and their corresponding data removed when a tag is modified", which seems like a reasonable strategy (and the only one that could work for e.g. checksums).