beetbox / audioread

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect number of channels with ffmpeg

simon816 opened this issue · comments

Version: 2.1.9

It is possible to fool audioread into determining there are 0 channels in the audio file when it does actually have an audio channel.

This occurs when metadata in the file contains the string "audio:"

Test case:

$ ffmpeg -i test/data/test-2.mp3 -metadata description="audio: broken" out.mp3
$ python -c 'import audioread; print(audioread.audio_open("out.mp3", backends=[audioread.ffdec.FFmpegAudioFile]).channels)'
0

audioread assumes the first line on stderr containing "audio:" is ffmpeg outputting stream information

elif 'audio:' in line:

As seen in the following output, the description containing "audio: broken" occurs before "Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s"

$ ffmpeg -i out.mp3 -f s16le - > /dev/null
ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
  configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mp3, from 'out.mp3':
  Metadata:
    description     : audio: broken
    encoder         : Lavf58.29.100
  Duration: 00:00:02.04, start: 0.025057, bitrate: 129 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.54
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, s16le, to 'pipe:':
  Metadata:
    description     : audio: broken
    encoder         : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc58.54.100 pcm_s16le
size=     345kB time=00:00:02.00 bitrate=1411.2kbits/s speed= 410x    

Wow; thanks for the detailed report! This looks like a tricky edge case; we can certainly make our parsing more robust.

One simple option would be to also require the line (after stripping) to start with Stream . I'm not familiar enough with FFmpeg's output format to be certain that this is always how the output looks, but it would certainly rule out erroneous confusion with the description field.