AlfredoSequeida / fvid

fvid is a project that aims to encode any file as a video using 1-bit color images to survive compression algorithms for data retrieval.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PR Compatibility

Theelx opened this issue · comments

@dtaivpp @dobrosketchkun
Currently, we together have 4 different PRs open. Do you guys/gals think we should combine them into one PR to make it easier and faster for Alfredo to review? Plus, if we all have different versions of the code, there will be redundancies and missing pieces when it's all merged.

(Also, dobro, I'm actually still having an issue with your cryptography stuff, so if other people have the same issue yours may need to be left as a separate PR).

Well, I don't really know how pull requests work in detail, so if you tell me how we can certainly do this.
About crypto stuff - yeah, I don't really know what the matter; I have checked it on two machines (Win and Arch) and on multiple files of various sizes with and without passwords.

Ok, well in that case can you explain how each function works so I can try to understand what I'm doing wrong? Comments in the code itself would also be nice

first of all, cryptography modules:

pip install cryptography
pip install pycryptodome

Let us check the encoding procedure first:

py fvid.py -i Lenna.png -e

As the "--password" flag is not used, the program will utilize the default one (variable DEFAULT_KEY). If the flag is used new key will be generated from the provided password (either from the command line or from getpass function) with PBKDF2HMAC function, with a length of 32 (as AES requires)

Then, get_bits_from_file() reads data from the file using BitArray() class; after that, the DELIMITER is to be added to avoid file end corruption. A new instance of AES class is initiated with key and SALT (needed to be the same in the same encryption/decryption event). AES eats bytes from bitarray.tobytes(), and we get a ciphertext and special tag (we can confirm the integrity of data and validity of the password). After that, all of this, including the original filename, get pickled and gzipped into another BitArray() instance and returned as binary. Later it will be digested by magic in courtesy of @AlfredoSequeida and spilled as video.

Quick scheme:

  1. file data reading into Bitarray
  2. adding delimiter
  3. AES encrypting
  4. pickling
  5. zipping
  6. zipped into Bitarray
  7. return binary
    ....
  8. PROFIT

So, say we already have video and want the original file back.

py fvid.py -i Lenna.png -d

The video got eaten by get_bits_from_video(), and we have a sequence of bits. Then in save_bits_to_file(), we use these bits to initiate a Bits instance. Gzip unzip bytes representation, and we get a pickle data from (4). After that, we unpack the tag, data and filename, and decrypt data with key and SALT. We also check with cipher.verify(tag) if everything ok with data and password.
With my modification, we cut data by the delimiter in this function, not in get_bits_from_video() as in the original. After that, we either save data to the original filename or to the provided by the user.

Thanks for explaining that! I just found that the problem was that I was directing my get_bits_from_image through my Cython module, and I hadn't taken out the delimiter thing for that. It works now, but it takes a huge amount of time because for Lenna's mp4, it has 300 frames for some reason and the mp4 itself is over 8MB. It took like 2-3 minutes to get just 1/6th of the way through decoding all the frames. Is it like that for you?

Nope, in my case, Lenna.mp4 is 1829 Kb with 2 frames

Reading video...
Bits are in place
Getting bits from frame: 100%|████████████████████████████████████████████████████| 1080/1080 [00:01<00:00, 541.39it/s]
Getting bits from frame: 100%|████████████████████████████████████████████████████| 1080/1080 [00:01<00:00, 608.12it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.92s/it]
Unziping...
Checking integrity...

That's really weird. Can you send the mp4?

Here zipped mp4 and frames:
Lenna.zip

commands:

py fvid.py -i Lenna.png -e -o Lenna.mpr
py fvid.py -i Lenna.mp4 -d -o Lenna2.png

Hm ok, when I decode that on Ubuntu, with both Cython and Python versions, I get this, it seems that the bits are None somehow:

  File "/root/fvid/fvid/fvid.py", line 185, in save_bits_to_file
    bitstring = Bits(bin=bits)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/site-packages/bitstring.py", line 844, in __new__
    x._initialise(auto, length, offset, **kwargs)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/site-packages/bitstring.py", line 865, in _initialise
    init_without_length_or_offset[k](self, v)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/site-packages/bitstring.py", line 1891, in _setbin_safe
    binstring = tidy_input_string(binstring)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/site-packages/bitstring.py", line 509, in tidy_input_string
    s = ''.join(s.split()).lower()
AttributeError: 'NoneType' object has no attribute 'split'

Edit: This is because of a debugging decorator I used, nvm, the error is actually this (command is python3 ./__main__.py -i ./Lenna.mp4 -d -o ./Lenna2.png:

  File "/root/fvid/fvid/fvid.py", line 346, in main
    save_bits_to_file(file_path, bits, key)
  File "/root/fvid/fvid/fvid.py", line 192, in save_bits_to_file
    bitstring = fo.read()
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'\x90L')

If you can interpret this SO answer about the problem, it might help fix it:
https://stackoverflow.com/a/11158569

If you can add a mode that gives the option to skip Gzip (de)compression, that could bypass the issue, as it works for me on certain images (though it's much slower because it has to process like 10x as many frames for the same mp4 compared to the current version), but not others.

Update: When I encode Lenna at 30fps instead of the default 1/5, gzip still tells me it's not a gzipped file, but the diagnostics show that the resulting mp4 has the same contents as your Lenna.mp4 (except that this one is 5523KB instead of yours, which is like 1829KB). That strikes me as interesting.

If you can add a mode that gives the option to skip Gzip (de)compression

I'm certainly cannot, because it bring upon us a noncompatibillity issue.

Try this modification just without gzip. I added sys.exit(main()) to the end, so you can use just this code without __main__.py etc

from bitstring import Bits, BitArray
# from magic import Magic
# import mimetypes
from PIL import Image
import glob

from operator import sub
import numpy as np
from tqdm import tqdm
import ffmpeg

import binascii

import argparse
import sys
import os

import getpass 

import io
import gzip
import pickle

from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
from Crypto.Cipher import AES

DELIMITER = bin(int.from_bytes("HELLO MY NAME IS ALFREDO".encode(), "big"))
FRAMES_DIR = "./fvid_frames/"
SALT = '63929291bca3c602de64352a4d4bfe69'.encode()  # It need be the same in one instance of coding/decoding
DEFAULT_KEY = ' '*32
DEFAULT_KEY = DEFAULT_KEY.encode()
NOTDEBUG = True

class WrongPassword(Exception):
    pass

class MissingArgument(Exception):
    pass

def get_password(password_provided):
    if password_provided=='default':
        return DEFAULT_KEY
    else:
        password_provided = getpass.getpass("Enter password:")
        password = str(password_provided).encode()  
        kdf = PBKDF2HMAC(
            algorithm=hashes.SHA512(),
            length=32,
            salt=SALT,
            iterations=100000,
            backend=default_backend()
            )
        key = kdf.derive(password)
    return key


def get_bits_from_file(filepath, key):
    print('Reading file...')
    bitarray = BitArray(filename=filepath)
    # adding a delimiter to know when the file ends to avoid corrupted files
    # when retrieving
    bitarray.append(DELIMITER)

    cipher = AES.new(key, AES.MODE_EAX, nonce=SALT)
    ciphertext, tag = cipher.encrypt_and_digest(bitarray.tobytes())
    
    filename = os.path.basename(filepath)
    pickled = pickle.dumps({'tag':tag,
                            'data':ciphertext,
                            'filename':filepath})
    # print('Ziping...')
    # #zip
    # out = io.BytesIO()
    # with gzip.GzipFile(fileobj=out, mode='w') as fo:
        # fo.write(pickled)
    # zip = out.getvalue()
    # #zip
    zip = pickled
    bitarray = BitArray(zip)
    return bitarray.bin

def less(val1, val2):
    return val1 < val2

def get_bits_from_image(image):
    width, height = image.size

    done = False

    px = image.load()
    bits = ""

    pbar = tqdm(range(height), desc="Getting bits from frame")

    white = (255, 255, 255)
    black = (0, 0, 0)
    
    for y in pbar:
        for x in range(width):

            pixel = px[x, y]

            pixel_bin_rep = "0"

            # for exact matches
            if pixel == white:
                pixel_bin_rep = "1"
            elif pixel == black:
                pixel_bin_rep = "0"
            else:
                white_diff = tuple(map(abs, map(sub, white, pixel)))
                # min_diff = white_diff
                black_diff = tuple(map(abs, map(sub, black, pixel)))


                # if the white difference is smaller, that means the pixel is closer
                # to white, otherwise, the pixel must be black
                if all(map(less, white_diff, black_diff)):
                    pixel_bin_rep = "1"
                else:
                    pixel_bin_rep = "0"

            # adding bits
            bits += pixel_bin_rep

    return (bits, done)


def get_bits_from_video(video_filepath):
    # get image sequence from video
    print('Reading video...')
    image_sequence = []

    ffmpeg.input(video_filepath).output(
        f"{FRAMES_DIR}decoded_frames%03d.png"
    ).run(quiet=NOTDEBUG)

    for filename in glob.glob(f"{FRAMES_DIR}decoded_frames*.png"):
        image_sequence.append(Image.open(filename))

    bits = ""
    sequence_length = len(image_sequence)
    print('Bits are in place')
    for index in tqdm(range(sequence_length)):
        b, done = get_bits_from_image(image_sequence[index])

        bits += b

        if done:
            break

    return bits


def save_bits_to_file(file_path, bits, key):
    # get file extension

    bitstring = Bits(bin=bits)

    # #zip
    # print('Unziping...')
    # in_ = io.BytesIO()
    # in_.write(bitstring.bytes)
    # in_.seek(0)
    # with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
        # bitstring = fo.read()
    # #zip


    unpickled = pickle.loads(bitstring.tobytes())
    tag = unpickled['tag']
    ciphertext = unpickled['data']
    filename = unpickled['filename']
    
    cipher = AES.new(key, AES.MODE_EAX, nonce=SALT)
    bitstring = cipher.decrypt(ciphertext)
    print('Checking integrity...')
    try:
     cipher.verify(tag)
     # print("The message is authentic")
    except ValueError:
     raise WrongPassword("Key incorrect or message corrupted")

    bitstring = BitArray(bitstring)

    
    _tD = Bits(bin=DELIMITER) # New way to find a DELIMITER
    _tD = _tD.tobytes()
    _temp = list(bitstring.split(delimiter=_tD))
    bitstring = _temp[0]


    # If filepath not passed in use defualt
    #    otherwise used passed in filepath
    if file_path == None:
        filepath = filename
    else:
        filepath = file_path # No need for mime Magic

    with open(
        filepath, "wb"
    ) as f:
        bitstring.tofile(f)


def make_image(bit_set, resolution=(1920, 1080)):

    width, height = resolution

    image = Image.new("1", (width, height))
    image.putdata(bit_set)

    return image


def split_list_by_n(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i : i + n]


def make_image_sequence(bitstring, resolution=(1920, 1080)):
    width, height = resolution

    # split bits into sets of width*height to make (1) image
    set_size = width * height

    # bit_sequence = []
    print('Making image sequence')
    list_bit = list(map(int, tqdm(bitstring)))
    bit_sequence = split_list_by_n(list_bit, width * height)
    image_bits = []

    # using bit_sequence to make image sequence

    image_sequence = []

    for bit_set in bit_sequence:
        image_sequence.append(make_image(bit_set))

    return image_sequence


def make_video(output_filepath, image_sequence, framerate="1/5"):

    if output_filepath == None:
        outputfile = "file.mp4"
    else:
        outputfile = output_filepath


    frames = glob.glob(f"{FRAMES_DIR}encoded_frames*.png")

    # for one frame
    if len(frames) == 1:
        ffmpeg.input(frames[0], loop=1, t=1).output(
            outputfile, vcodec="libx264rgb"
        ).run(quiet=NOTDEBUG)

    else:
        if sys.platform != 'win32':
            ffmpeg.input(
                f"{FRAMES_DIR}encoded_frames*.png",
                pattern_type="glob",
                framerate=framerate,
            ).output(outputfile, vcodec="libx264rgb").run(quiet=NOTDEBUG)
        else:
            os.system('ffmpeg -i ./fvid_frames/encoded_frames_%d.png ' + outputfile)



def cleanup():
    # remove frames
    import shutil

    shutil.rmtree(FRAMES_DIR)


def setup():
    import os

    if not os.path.exists(FRAMES_DIR):
        os.makedirs(FRAMES_DIR)


def main():
    parser = argparse.ArgumentParser(description="save files as videos")
    parser.add_argument(
        "-e", "--encode", help="encode file as video", action="store_true"
    )
    parser.add_argument(
        "-d", "--decode", help="decode file from video", action="store_true"
    )

    parser.add_argument("-i", "--input", help="input file", required=True)
    parser.add_argument("-o", "--output", help="output path")
    parser.add_argument("-f", "--framerate", help="set framerate for encoding (as a fraction)", default="1/5", type=str)
    parser.add_argument("-p", "--password", help="set password", nargs="?", type=str, default='default')

    args = parser.parse_args()

    setup()
    # print(args)
    # print('PASSWORD', args.password, [len(args.password) if len(args.password) is not None else None for _ in range(0)])
    
    if not args.decode and not args.encode:
        raise   MissingArgument('You should use either --encode or --decode!') #check for arguments
    
    key = get_password(args.password)
    
    if args.decode:
        bits = get_bits_from_video(args.input)

        file_path = None

        if args.output:
            file_path = args.output

        save_bits_to_file(file_path, bits, key)

    elif args.encode:
        # isdigit has the benefit of being True and raising an error if the user passes a negative string
        # all() lets us check if both the negative sign and forward slash are in the string, to prevent negative fractions
        if (not args.framerate.isdigit() and "/" not in args.framerate) or all(x in args.framerate for x in ("-", "/")):
            raise NotImplementedError("The framerate must be a positive fraction or an integer for now, like 3, '1/3', or '1/5'!")
        # get bits from file
        bits = get_bits_from_file(args.input, key)

        # create image sequence
        image_sequence = make_image_sequence(bits)

        # save images
        for index in range(len(image_sequence)):
            image_sequence[index].save(
                f"{FRAMES_DIR}encoded_frames_{index}.png"
            )

        video_file_path = None

        if args.output:
            video_file_path = args.output

        make_video(video_file_path, image_sequence, args.framerate)
    
    cleanup()


sys.exit(main())

@Theelgirl No, scratch that, I think I've found something. Try this video with this code.

another_Lenna.zip

Running the unzipped fvid code with the unzipped mp4 doesn't change anything for gzip.

  File "fvid.py", line 334, in <module>
    sys.exit(main())
  File "fvid.py", line 305, in main
    save_bits_to_file(file_path, bits, key)
  File "fvid.py", line 170, in save_bits_to_file
    bitstring = fo.read()
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "/root/.pyenv/versions/3.8.3/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'\x90L')

When I encode then decode with the full code you pasted in two comments ago, I get this:

Traceback (most recent call last):
  File "fvid.py", line 349, in <module>
    sys.exit(main())
  File "fvid.py", line 320, in main
    save_bits_to_file(file_path, bits, key)
  File "fvid.py", line 172, in save_bits_to_file
    unpickled = pickle.loads(bitstring.tobytes())
_pickle.UnpicklingError: invalid load key, '\xd4'.

I'm thinking gzip and pickle just aren't going to work.

Well, they are working alright in another two machines, so there is a possibility.

I've made a bunch of modifications, including the general logic of image processing (it helped with files over 200 mb). Could you please try this script variant?

fvid_11.zip

I am now able to successfully decode the Lenna.mp4 you sent me, however it is unable to decode the Lenna.png that it encoded itself, giving the same gzip error.

Update:
It seems to work now! I think I forgot to delete the old file.mp4

@dobrosketchkun I'm going to update my Cython Support PR with the new Cython code, then can you add all of that to your PR (it modifies the setup.py also)? Then, I can close both of my PRs and @AlfredoSequeida can merge yours, which will fix all the problems mentioned in the current 4 PRs.

It seems to work now!

Nice!

I'm going to update <...>

Yeah, seems logic, I'll try.

I cloned your patch-2 branch and tried to fvid a Lenna.png:

py __main__.py -i Lenna.png -e
Traceback (most recent call last):
  File "__main__.py", line 2, in <module>
    from fvid import main
  File "D:\recovery3\python\fvid\fvid\new\fvid-theelgirl\fvid\fvid.py", line 30, in <module>
    from fvid_cython import cy_get_bits_from_image as cy_gbfi
ModuleNotFoundError: No module named 'fvid_cython'

I'm sorry, I've never used Cython, what is the problem? I have the Cython module. Maybe I need to install fvid, and thus I cannot use it as just a script?

Oh it didn't properly compile. Try adding ModuleNotFoundError to the except for now.
You can also put this in a file named "setup.py":

from distutils.core import Extension, setup
from Cython.Build import cythonize

ext = Extension(name="fvid_cython", sources=["fvid_cython.pyx"])
setup(ext_modules=cythonize(ext, compiler_directives={'language_level': 3}))

Then run python3 setup.py build_ext --inplace (at least for Ubuntu, you can check the docs for compilation instructions on Windows and Arch).

commented
$ python setup.py build_ext --inplace
Compiling fvid/fvid_cython.pyx because it changed.
[1/1] Cythonizing fvid/fvid_cython.pyx

Error compiling Cython file:
------------------------------------------------------------
...
from operator import sub
from tqdm import tqdm

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
^
------------------------------------------------------------

fvid/fvid_cython.pyx:7:0: Cdef functions/classes cannot take arbitrary decorators.
Traceback (most recent call last):
  File "setup.py", line 14, in <module>
    setup(ext_modules=cythonize(ext, compiler_directives={'language_level': 3}))
  File "/usr/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1102, in cythonize
    cythonize_one(*args)
  File "/usr/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1225, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: fvid/fvid_cython.pyx

...is what I get.

Are you on windows? It works fine for me on Ubuntu 20.04. Maybe you need to cimport cython at the top.

I am on Windows 10, python 3.7.4

Yeah, I was talking to zavok, but does it fail for you too?

You can probably also remove the decorators, it shouldn't change performance that much.

You see, I don't really want to install it as a module. Is it really necessary?

commented

I use Arch.

$ git diff
diff --git a/fvid/fvid_cython.pyx b/fvid/fvid_cython.pyx
index 5eabffb..b302f5a 100644
--- a/fvid/fvid_cython.pyx
+++ b/fvid/fvid_cython.pyx
@@ -1,7 +1,7 @@
 # distutils: language=c++
 from operator import sub
 from tqdm import tqdm
-
+import cython
 @cython.boundscheck(False)
 @cython.wraparound(False)
 @cython.cdivision(True)
diff --git a/setup.py b/setup.py
index d6ef718..83b94dd 100644
--- a/setup.py
+++ b/setup.py
@@ -6,12 +6,19 @@ from setuptools import setup
 from setuptools import Extension
 from setuptools.command.build_ext import build_ext as _build_ext

+
+from distutils.core import Extension, setup
+from Cython.Build import cythonize
+
+ext = Extension(name="fvid_cython", sources=["fvid/fvid_cython.pyx"])
+setup(ext_modules=cythonize(ext, compiler_directives={'language_level': 3}))
+
 try:
     from Cython.Build import cythonize
 except ImportError:
     pass
 else:
-    cythonize(Extension("fvid_cython", "fvid/fvid_cython.pyx"), compiler_directives={'language_level': "3", 'infer_types': True})
+    cythonize(Extension("fvid_cython", ["fvid/fvid_cython.pyx"]), compiler_directives={'language_level': "3", 'infer_types': True})

 with open("README.md", "r") as fh:
     long_description = fh.read()

After this diff (notice how second argument to Extension() had to be converted into list, or python would complain) it built fvid_cython.cpython-38-x86_64-linux-gnu.so file.
Do I now just run python fvid/__main__.py as usual or there are some other body movements necessary?

@dobrosketchkun If you add an exception for ModuleNotFoundError to the import, then you can let it not use Cython, however it's 4x faster at decoding when you do use it. Also, put zavok's edits in setup.py into your version, since yours is being merged:

+from distutils.core import Extension, setup
+from Cython.Build import cythonize
+
+ext = Extension(name="fvid_cython", sources=["fvid/fvid_cython.pyx"], include_dirs=["./fvid"])
+setup(ext_modules=cythonize(ext, compiler_directives={'language_level': 3}))
+
 try:
     from Cython.Build import cythonize
 except ImportError:
     pass
 else:
-    cythonize(Extension("fvid_cython", "fvid/fvid_cython.pyx"), compiler_directives={'language_level': "3", 'infer_types': True})
+    cythonize(Extension("fvid_cython", ["fvid/fvid_cython.pyx"], include_dirs=["./fvid"]), compiler_directives={'language_level': "3", 'infer_types': True})

@zavok If it successfully compiled, you can run fvid/main.py as usual.

commented

Okay, the .so file had to be moved to fvid/ folder, but otherwise looks like it worked.

Oh yeah, you need to add an include_dirs argument if you want to use it outside of the folder it was compiled in.
Update: Edited the setup.py info in my comment from 14 minutes ago to have the include_dirs argument.

Sorry for the wait guys, I have had a busy past 2 days. I will definitely try my best to go through all of this as soon as I can.

Yeah, in a day or two I'll try to combine Theelgirl's Cython approach and my thingy and give you folks a report.

@dobrosketchkun I can combine them into a new PR if you want, I understand how the Cython works and you've done a great job of explaining your approach so I think I understand it

@Theelgirl first, please try this modification #15 (comment)

@dobrosketchkun Tried it, it works but it's actually only slightly faster and uses far more disk than before (and the password PR can be made faster than the PPM P3 version with little effort), left some feedback on why that is in comments there.

Closing because the password/cython PRs have been merged together into #21