rubenvillegas / iclr2017mcnet

Tensorflow implementation of the ICLR 2017 paper: Decomposing Motion and Content for Natural Video Sequence Prediction

Home Page:https://sites.google.com/a/umich.edu/rubenevillegas/iclr2017

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pretrained model which can be trained further

NagabhushanSN95 opened this issue · comments

Hi,
I'm trying to load the pretrained model (on S1M dataset) you've provided and train it further on another dataset (PENN) instead of starting from scratch. But when creating MCNET model, if I pass is_train=True, I get an error that the checkpoint doesn't have all the variables.
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.

Can you kindly provide a pretrained model which can be loaded and trained further? Or can I make some changes to the code to achieve that?

With the version of tensorflow I'm using, I was able to load your model to test. That is working perfectly fine. But when I load to train, that is when problems are arising.

Anyway, I'll cross check once the tensorflow version.

Hi, I tried with tensorflow_gpu-1.1.0. Still getting similar error.
For some reason, tensorflow installed with pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl was giving error when I import tensorflow

Python 2.7.16 |Anaconda, Inc.| (default, Aug 22 2019, 16:00:36) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet3/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

So, I installed tensorflow-1.1.0 with conda
conda install tensorflow-gpu=1.1.0. It installed tensorflow-gpu=1.1.0=np111py27_0. Is this tensorflow version fine?
With this, import worked, but restore model didn't work here as well. Same error.

Here is a list of packages installed for reference.

name: MCnet2
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - ca-certificates=2019.6.16=hecc5488_0
  - certifi=2019.6.16=py27_1
  - cudatoolkit=7.5=2
  - cudnn=5.1=0
  - funcsigs=1.0.2=py_3
  - libblas=3.8.0=12_openblas
  - libcblas=3.8.0=12_openblas
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - liblapack=3.8.0=12_openblas
  - libopenblas=0.3.7=h6e990d7_1
  - libprotobuf=3.9.1=h8b12597_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - mock=3.0.5=py27_0
  - ncurses=6.1=he6710b0_1
  - openssl=1.1.1c=h516909a_0
  - pip=19.2.2=py27_0
  - protobuf=3.9.1=py27he1b5a44_0
  - python=2.7.16=h8b3fad2_5
  - readline=7.0=h7b6447c_5
  - setuptools=41.0.1=py27_0
  - sqlite=3.29.0=h7b6447c_0
  - tensorflow-gpu=1.1.0=np111py27_0
  - tk=8.6.8=hbc83047_0
  - werkzeug=0.15.5=py_0
  - wheel=0.33.4=py27_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - backports-functools-lru-cache==1.5
    - cloudpickle==1.2.1
    - cycler==0.10.0
    - decorator==4.4.0
    - enum34==1.1.6
    - futures==3.3.0
    - imageio==2.5.0
    - joblib==0.13.2
    - kiwisolver==1.1.0
    - matplotlib==2.2.4
    - networkx==2.2
    - numpy==1.16.5
    - opencv-python==4.1.1.26
    - pillow==6.1.0
    - pyparsing==2.4.2
    - pyssim==0.4
    - python-dateutil==2.8.0
    - pytube==9.5.1
    - pytz==2019.2
    - pywavelets==1.0.3
    - scikit-image==0.14.4
    - scikit-video==1.1.11
    - scipy==1.2.2
    - six==1.12.0
    - subprocess32==3.5.4
prefix: /media/nagabhushan/Data02/SoftwareFiles/Anaconda/anaconda3/envs/MCnet2

After this, I tried upgrading the saved model with a script I found here: GitHub Tensorflow Issue. The file is https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/tools/checkpoint_convert.py

Even after converting, restore_model didn't work. Do you know if this is the right conversion script? Or am I using a wrong one?

Can you please provide a list of all package and system requirements to restore the model (for continuing training) or can you point me to a script or documentation on how to convert the models you've provided to latest tensorflow version?

Yeah. Thank you so much. I'll try that :)

Hi @NagabhushanSN95, did you able to do the training of the given s1m model? I am also facing the same issue.

Regards
Sharath

@sharathyadav1993 I tried a bit. Couldn't figure it out. Got busy with other work. Will update here if I'm able to solve it.

@sharathyadav1993 I tried as suggested in this StackOverflow answer. Worked like a charm. Posting the code here

# To port paper models to new tensorflow version
# Author: Nagabhushan S N
# Last Modified: 01/02/2020

from pathlib import Path

import tensorflow as tf

# Based on https://stackoverflow.com/a/57818431/3337089
from mcnet import MCNET


def port_model(model_path: Path, out_dir: Path):
    out_dir.mkdir(parents=True)
    save_path = out_dir / model_path.name

    with tf.Session() as sess:
        _ = MCNET(image_size=[240, 320], batch_size=8, K=4, T=7, c_dim=3, checkpoint_dir=None, is_train=True)
        tf.global_variables_initializer().run(session=sess)

        ckpt_vars = tf.train.list_variables(model_path.as_posix())
        ass_ops = []
        for dst_var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):
            for (ckpt_var, ckpt_shape) in ckpt_vars:
                if dst_var.name.split(":")[0] == ckpt_var and dst_var.shape == ckpt_shape:
                    value = tf.train.load_variable(model_path.as_posix(), ckpt_var)
                    ass_ops.append(tf.assign(dst_var, value))

        # Assign the variables
        sess.run(ass_ops)
        saver = tf.train.Saver()
        saver.save(sess, save_path.as_posix())


def main():
    model_path = Path('../../PretrainedModels/PaperModels/S1M/MCNET.model-102502')
    out_dir = model_path.parent.parent / 'S1M_v1.13.1'
    port_model(model_path, out_dir)


if __name__ == '__main__':
    main()

My environment details as follows:

name: MCnet
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - absl-py=0.7.1=py37_0
  - astor=0.7.1=py_0
  - bzip2=1.0.8=h7b6447c_0
  - c-ares=1.15.0=h516909a_1001
  - ca-certificates=2019.11.28=hecc5488_0
  - cairo=1.14.12=h8948797_3
  - certifi=2019.11.28=py37_0
  - cloudpickle=1.2.1=py_0
  - cycler=0.10.0=py_1
  - cytoolz=0.10.0=py37h516909a_0
  - dask-core=2.2.0=py_0
  - decorator=4.4.0=py_0
  - fontconfig=2.13.0=h9420a91_0
  - freeglut=3.0.0=hf484d3e_5
  - freetype=2.9.1=h8a8886c_1
  - gast=0.2.2=py_0
  - glib=2.56.2=hd408876_0
  - graphite2=1.3.13=h23475e2_0
  - grpcio=1.16.1=py37hf8bcb03_1
  - h5py=2.8.0=py37h989c5e5_3
  - harfbuzz=1.8.8=hffaf4a1_0
  - hdf5=1.10.2=hba1933b_1
  - icu=58.2=h9c2bf20_1
  - imageio=2.5.0=py37_0
  - jasper=2.0.14=h07fcdf6_1
  - joblib=0.13.2=py_0
  - jpeg=9b=h024ee3a_2
  - keras-applications=1.0.7=py_1
  - keras-preprocessing=1.0.9=py_1
  - kiwisolver=1.1.0=py37hc9558a2_0
  - libblas=3.8.0=11_openblas
  - libcblas=3.8.0=11_openblas
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libglu=9.0.0=hf484d3e_1
  - liblapack=3.8.0=11_openblas
  - libopenblas=0.3.6=h6e990d7_6
  - libopus=1.3=h7b6447c_0
  - libpng=1.6.37=hbc83047_0
  - libprotobuf=3.9.1=h8b12597_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.0.10=h2733197_2
  - libuuid=1.0.3=h1bed415_2
  - libvpx=1.7.0=h439df22_0
  - libxcb=1.13=h1bed415_1
  - libxml2=2.9.9=hea5a465_1
  - markdown=3.1.1=py_0
  - matplotlib-base=3.1.1=py37hfd891ef_0
  - mock=3.0.5=py37_0
  - ncurses=6.1=he6710b0_1
  - networkx=2.3=py_0
  - numpy=1.17.0=py37h95a1406_0
  - olefile=0.46=py_0
  - openssl=1.1.1d=h516909a_0
  - pandas=0.25.3=py37hb3f55d8_0
  - pcre=8.43=he6710b0_0
  - pillow=6.1.0=py37h34e0f95_0
  - pip=19.1.1=py37_0
  - pixman=0.38.0=h7b6447c_0
  - protobuf=3.9.1=py37he1b5a44_0
  - pyparsing=2.4.2=py_0
  - python=3.7.3=h0371630_0
  - python-dateutil=2.8.0=py_0
  - pytz=2019.3=py_0
  - pywavelets=1.0.3=py37hd352d35_1
  - readline=7.0=h7b6447c_5
  - scikit-image=0.15.0=py37hb3f55d8_2
  - scikit-learn=0.21.3=py37hcdab131_0
  - scikit-video=1.1.11=pyh24bf2e0_0
  - scipy=1.3.0=py37h921218d_1
  - setuptools=41.0.1=py37_0
  - six=1.12.0=py37_1000
  - sqlite=3.29.0=h7b6447c_0
  - tensorboard=1.13.1=py37_0
  - tensorflow=1.13.1=py37_0
  - tensorflow-estimator=1.13.0=py_0
  - termcolor=1.1.0=py_2
  - tk=8.6.9=hed695b0_1002
  - toolz=0.10.0=py_0
  - tornado=6.0.3=py37h516909a_0
  - werkzeug=0.15.5=py_0
  - wheel=0.33.4=py37_0
  - xz=5.2.4=h14c3975_4
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
  - pip:
    - imageio-ffmpeg==0.3.0
    - opencv-python==4.1.0.25
    - python-vlc==3.0.7110
    - ssim==0.3.0

@NagabhushanSN95 Thank you. I will check it.