akamhy / videohash

Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

Home Page:https://pypi.org/project/videohash

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Video paths containing spaces break ffmpeg calls

CaileanMParker opened this issue · comments

When using the from_path method of hashing a video, if the path to the video contains any number of spaces, it will break the ffmpeg commands given to subprocess.Popen. This is because:

  1. The paths within the command are not encapsulated by quotation marks, causing ffmpeg to interpret only the part of the path prior to the first whitespace as the target path, and the rest of the given path as additional, invalid arguments
  2. The command given to subprocess.Popen is split on spaces before being interpreted, again forcing the system to interpret different parts of the path as new arguments

This is easily fixed by inserting escaped quotation marks around any paths in the ffmpeg commands and dropping the .split() on operation in the command and setting shell=True in Popen.

I've taken the liberty of including the updated functions here:

def frames(input_file, output_prefix):
    """Extract the frames of the video.
    Export frames as images at output_prefix as a 7 digit padded jpeg file.
    """
    command = "ffmpeg -i \"{input_file}\" -r 1 \"{output_prefix}_%07d.jpeg\"".format(
        input_file=input_file, output_prefix=output_prefix
    )
    process = Popen(command, shell=True, stdout=DEVNULL, stderr=STDOUT)
    output, error = process.communicate()


def compressor(input_file, task_dir, task_uid):
    # APPLY : ffmpeg -i input.webm -s 64x64 -r 30  output.mp4

    output_file = join(task_dir, task_uid + "compressed.mp4")
    command = "ffmpeg -i \"{input_file}\" -s 64x64 -r 30 \"{output_file}\"".format(
        input_file=input_file, output_file=output_file
    )
    process = Popen(command, shell=True, stdout=DEVNULL, stderr=STDOUT)
    output, error = process.communicate()

    return output_file

Hope you find this useful! Thanks for the great module!

commented

This issue should be addressed in FramesExtractor as well. Given the Windows instructions for ffmpeg save the file in the "Program Files" dir, the below fails because of the space in the directory name. However, it is fixed in the same simple way @CaileanMParker recommended.

Recommended change below:

    def extract(self):
        """
        Extract the frames at every n seconds where n is the
        integer set to self.interval.
        """
        ffmpeg_path = self.ffmpeg_path
        video_path = self.video_path
        output_dir = self.output_dir
        if os.name == "posix":
            ffmpeg_path = shlex.quote(self.ffmpeg_path)
            video_path = shlex.quote(self.video_path)
            output_dir = shlex.quote(self.output_dir)

        command = (
            f'"{ffmpeg_path}"'
            + " -i "
            + '"'
            + video_path
            + '"'
            + " -s 144x144 "
            + " -r "
            + str(self.interval)
            + " "
            + '"'
            + output_dir
            + "video_frame_%07d.jpeg"
            + '"'
        )

        process = Popen(command, shell=True, stdout=PIPE, stderr=PIPE)

Hope this helps and echo the comments on a great package!!

Just for future reference, I thought that I fixed the issue by copying all files to the temporary storage directory but I forgot that the package now finds the FFmpeg and the installation path can also contain whitespace. Thanks, @cwberardi for using the package on windows and reporting the issue. Also, RIP Python 3.5 support(doesn't matter cuz retired).