Is it true we have to combine the video and audio files using ffmpeg or the python or JS ffmpeg port?

Question

Is it true we have to combine the video and audio files using ffmpeg or the python or JS ffmpeg port?

nonopolarity opened this issue a year ago · comments

Often we have to download the 1080 video and audio file separately as 2 files. Is it true we just have to use ffmpeg or the Python or JS port of ffmpeg to combine the 2 files into one .mp4 file? ytdl-core probably doesn't have this feature?

(example:
https://zulko.github.io/moviepy/
https://github.com/ffmpegwasm/ffmpeg.wasm )

Richard Anthony B. Abear · Answer 1 · Mon May 01 2023 18:31:55 GMT+0800 (China Standard Time)

Youtube separates the audio and video streams for higher resolution videos.

You will have to use ffmpeg to combine these streams, thankfully this repo has an example https://github.com/fent/node-ytdl-core/blob/master/example/ffmpeg.js

nonopolarity · Answer 2 · Wed May 03 2023 13:15:30 GMT+0800 (China Standard Time)

does ffmpeg combine the video and audio like in a few seconds? I could also use Final Cut to combine them as it basically is a reencode and it takes a long time. VLC Player can also combine the video and audio and it takes only 1 or 2 seconds or just a few seconds even if the video length is an hour

Richard Anthony B. Abear · Answer 3 · Wed May 03 2023 16:54:05 GMT+0800 (China Standard Time)

I think that will depend on your hardware and usecase.

I have found that the performance of ffmpeg is quite impressive when it comes to mixing just the 2 streams. Also it wouldnt be a "re encoding" technically, the way the documentation describes it. because you are just copying the stream from the video "-c:v copy" flag

nonopolarity · Answer 4 · Wed May 03 2023 18:39:20 GMT+0800 (China Standard Time)

I am more concerned about, doing it this way using ffmpeg,

does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

Christian Genco · Answer 5 · Thu May 04 2023 09:15:55 GMT+0800 (China Standard Time)

You only have to combine the video and audio files if you download video-only and audio-only streams.

If you don't care about downloading the absolute highest quality you can just download the highest quality stream that already contains audio and video with something like this:

const info = await ytdl.getInfo(url, {});
const format = ytdl.chooseFormat(formats, {
  filter: "audioandvideo",
  quality: "highest",
});
ytdl.downloadFromInfo(info, {
  quality: format.itag
})

nonopolarity · Answer 6 · Thu May 04 2023 15:24:22 GMT+0800 (China Standard Time)

You only have to combine the video and audio files if you download video-only and audio-only streams.

If you don't care about downloading the absolute highest quality you can just download the highest quality stream that already contains audio and video with something like this

right. in the past it often means 360p, which is vastly different from 720 or 1080p

Richard Anthony B. Abear · Answer 7 · Thu May 04 2023 15:26:33 GMT+0800 (China Standard Time)

I am more concerned about, doing it this way using ffmpeg,

does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or

does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

That completely depends on your use case.

In my use case I just use the second option (copy) i dont reencode.

nonopolarity · Answer 8 · Thu May 04 2023 15:42:56 GMT+0800 (China Standard Time)

I am more concerned about, doing it this way using ffmpeg,

does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or

does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

the question is not about which one is it. The question is about how does ffmpeg do it and naturally, if a job can be done in 2 seconds, I don't want to spend 2 to 5 minutes to do it.

Richard Anthony B. Abear · Answer 9 · Thu May 04 2023 15:44:30 GMT+0800 (China Standard Time)

pass the -c copy flag to the ffmpeg command and it wont reencode

Christian Genco · Answer 10 · Thu May 04 2023 23:42:52 GMT+0800 (China Standard Time)

-c:v copy and -c:a copy will only work if you're merging two compatible streams (or if you're merging them into an mkv wrapper that basically supports streams of any type).

If your video is encoded with h264 (.mp4) your audio needs to be encoded with aac to copy both streams into a new .mp4 without re-encoding.

If your video is encoded with vp8 or vp9 (.webm) your audio needs to be encoded with either opus or vorbis to copy both streams into a new .webm without re-encoding.

The technique the example ffmpeg.js script uses to merge audio and video is to always copy the audio codec and always re-encode the audio (it includes -c:v copy but doesn't specify the audio encoding which means ffmpeg will always re-encode the audio to a compatible format).

This isn't a terrible strategy because:

It will produce a playable video every time.
Re-encoding audio takes an order of magnitude less time than re-encoding video.
It's simple. You don't need a first pass of ffprobe to check that the streams are compatible.

You could make sure you never re-encode by selecting compatible video and audio streams at download time.

Richard Anthony B. Abear · Answer 11 · Fri May 05 2023 09:25:33 GMT+0800 (China Standard Time)

To add more to @christiangenco 's answer in my experience or at least the way I understand it is, that youtube will take your input video (the video file you upload) and re-encode it in those exact formats (h264/h265) for videos and then aac for audio, therefore when using the ffmpeg method, you are able to just use copy encoding all the time (atleast in my experience)

Christian Genco · Answer 12 · Fri May 05 2023 23:25:22 GMT+0800 (China Standard Time)

Yup 👆

The trouble is that YouTube also re-encodes your video into webm and opus so often when I ask node-ytdl-core for bestaudio and bestvideo it gives me two incompatible formats.

Kinuseka · Answer 13 · Mon Jul 24 2023 21:24:33 GMT+0800 (China Standard Time)

I recommend avoid using opus for audio and use the mp4a.40.2 if you are planning to use the .mp4 format

mp4 players usually does not support 48khz which opus uses.

Luciano Repetti · Answer 14 · Wed Jan 03 2024 13:38:17 GMT+0800 (China Standard Time)

How can I merge video and audio to output an mp4?

My code:

        const audioStream = ytdl(URL as string, {
          filter: 'audioonly',
          quality: 'highestaudio',
        });

        const videoStream = ytdl(URL as string, {
          filter: (format) => format.hasVideo && (format.container === 'mp4' || format.container === 'webm'),
          quality: qualityOption,
        });