fent / node-ytdl-core

YouTube video downloader in javascript.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it true we have to combine the video and audio files using ffmpeg or the python or JS ffmpeg port?

nonopolarity opened this issue · comments

Often we have to download the 1080 video and audio file separately as 2 files. Is it true we just have to use ffmpeg or the Python or JS port of ffmpeg to combine the 2 files into one .mp4 file? ytdl-core probably doesn't have this feature?

(example:
https://zulko.github.io/moviepy/
https://github.com/ffmpegwasm/ffmpeg.wasm )

Youtube separates the audio and video streams for higher resolution videos.

You will have to use ffmpeg to combine these streams, thankfully this repo has an example https://github.com/fent/node-ytdl-core/blob/master/example/ffmpeg.js

does ffmpeg combine the video and audio like in a few seconds? I could also use Final Cut to combine them as it basically is a reencode and it takes a long time. VLC Player can also combine the video and audio and it takes only 1 or 2 seconds or just a few seconds even if the video length is an hour

I think that will depend on your hardware and usecase.

I have found that the performance of ffmpeg is quite impressive when it comes to mixing just the 2 streams. Also it wouldnt be a "re encoding" technically, the way the documentation describes it. because you are just copying the stream from the video "-c:v copy" flag

I am more concerned about, doing it this way using ffmpeg,

  1. does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
  2. does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

You only have to combine the video and audio files if you download video-only and audio-only streams.

If you don't care about downloading the absolute highest quality you can just download the highest quality stream that already contains audio and video with something like this:

const info = await ytdl.getInfo(url, {});
const format = ytdl.chooseFormat(formats, {
  filter: "audioandvideo",
  quality: "highest",
});
ytdl.downloadFromInfo(info, {
  quality: format.itag
})

You only have to combine the video and audio files if you download video-only and audio-only streams.

If you don't care about downloading the absolute highest quality you can just download the highest quality stream that already contains audio and video with something like this

right. in the past it often means 360p, which is vastly different from 720 or 1080p

I am more concerned about, doing it this way using ffmpeg,

  1. does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
  2. does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

That completely depends on your use case.

In my use case I just use the second option (copy) i dont reencode.

I am more concerned about, doing it this way using ffmpeg,

  1. does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
  2. does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).

Which one is it?

the question is not about which one is it. The question is about how does ffmpeg do it and naturally, if a job can be done in 2 seconds, I don't want to spend 2 to 5 minutes to do it.

pass the -c copy flag to the ffmpeg command and it wont reencode

-c:v copy and -c:a copy will only work if you're merging two compatible streams (or if you're merging them into an mkv wrapper that basically supports streams of any type).

If your video is encoded with h264 (.mp4) your audio needs to be encoded with aac to copy both streams into a new .mp4 without re-encoding.

If your video is encoded with vp8 or vp9 (.webm) your audio needs to be encoded with either opus or vorbis to copy both streams into a new .webm without re-encoding.

The technique the example ffmpeg.js script uses to merge audio and video is to always copy the audio codec and always re-encode the audio (it includes -c:v copy but doesn't specify the audio encoding which means ffmpeg will always re-encode the audio to a compatible format).

This isn't a terrible strategy because:

  1. It will produce a playable video every time.
  2. Re-encoding audio takes an order of magnitude less time than re-encoding video.
  3. It's simple. You don't need a first pass of ffprobe to check that the streams are compatible.

You could make sure you never re-encode by selecting compatible video and audio streams at download time.

To add more to @christiangenco 's answer in my experience or at least the way I understand it is, that youtube will take your input video (the video file you upload) and re-encode it in those exact formats (h264/h265) for videos and then aac for audio, therefore when using the ffmpeg method, you are able to just use copy encoding all the time (atleast in my experience)

Yup 👆

The trouble is that YouTube also re-encodes your video into webm and opus so often when I ask node-ytdl-core for bestaudio and bestvideo it gives me two incompatible formats.

I recommend avoid using opus for audio and use the mp4a.40.2 if you are planning to use the .mp4 format

mp4 players usually does not support 48khz which opus uses.

How can I merge video and audio to output an mp4?

My code:

        const audioStream = ytdl(URL as string, {
          filter: 'audioonly',
          quality: 'highestaudio',
        });

        const videoStream = ytdl(URL as string, {
          filter: (format) => format.hasVideo && (format.container === 'mp4' || format.container === 'webm'),
          quality: qualityOption,
        });