little-core-labs / little-media-box

Convenient atomicized classes for representing digital multimedia assets in distributed Node.js DSP pipelines.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`Source.probe` sends `file://` URIs to `ffprobe` as a `ReadableStream`, degrading data quality

agrathwohl opened this issue · comments

Currently, all file:// URIs appear to be sent to ffprobe as a ReadableStream rather than a local file path string. Though a necessary step when wishing to send anything other than a local file to ffprobe, this degrades the quality of data we can collect at the container level when the file is local.

little-media-box/source.js

Lines 221 to 234 in febdee8

this.ready((err) => {
if (err) { return callback(err) }
const { uri } = this
const stream = this.createReadStream(opts)
this.active()
ffmpeg(stream).ffprobe((err, info) => {
this.inactive()
info.format.filename = info.format.filename === 'pipe:0' ?
path.basename(this.uri) :
info.format.filename
callback(err, info)
})
})

The above code provides an 'N/A' value for info.format.duration, info.format.bit_rate, info.format.size, and others. This has an effect on the probe_score value, which is a very important stat returned by ffprobe.

If the URI is a file:// address, and the following code change is made, all keys within info.format present correct values.

    this.ready((err) => {
      if (err) { return callback(err) }
      /*
      const { uri } = this
      const stream = this.createReadStream(opts)
      */

      this.active()
      ffmpeg(this.uri).ffprobe((err, info) => {
        this.inactive()
        info.format.filename = info.format.filename === 'pipe:0' ?
          path.basename(this.uri) :
          info.format.filename
        callback(err, info)
      })
    })

So ffmpeg can take a stream or a file URI?

fluent-ffmpeg offers the ability to provide:

  • one ReadableStream
  • one or more file URIs

as input into ffmpeg. ffprobe specifically may take one and only one input, of either a stream or a file URI. The documentation states:

Warning: ffprobe may be called with an input stream, but in this case it will consume data from the stream, and this data will no longer be available for ffmpeg. Using both ffprobe and a transcoding command on the same input stream will most likely fail unless the stream is a live stream. Only do this if you know what you're doing.

Indeed, the stream does get consumed when running ffprobe on it, that's another downside I hadn't thought of to doing it this way for file URIs.

great catch! PR is welcome ;)

Certainly @jwerle - I just have a few points I could use guidance on:

  1. Should we be creating the ReadableStream anyway to store in the returned Source object? Or was it always the intent to consume the stream created in this code?

  2. Is this an issue we ought to document for the user (that non-file:// URIs may not return all info)? And if so what's the most appropriate way to do so? GitHub wiki? README.md? :)

I think just modifying the probe function to use this.uri (if non-null) falling backing to a stream returned from this.createReadStream(). Documenting in the README.md for now seems fine until we decide on using the Wiki