wearepal / nifr

Hello the authors ,
when I follow the step Running the code ,run the code :
start_inn.py --dataset celeba --levels 3 --level-depth 32 --glow True --reshape-method squeeze --autoencode False --input-noise True --quant-level 5 --use-wandb True --factor-splits 0=0.5 1=0.5 --train-on-recon False --recon-detach False --batch-size 32 --nll-weight 1 --pred-s-weight 1e-2 --zs-frac 0.001 --coupling-channels 512 --super-val True --super-val-freq 10 --val-freq 1 --task-mixing 0.5 --gpu 0 --num-discs 10 --disc-channels 512 --data-split-seed 42 --epochs 30
I got
bash: start_inn.py: command not found
I have no idea with it , please help to look at it if you are free ,thanks ! @thomkeh

You should put python in front of the command. I will update the README so that it shows the full command.

You should put python in front of the command. I will update the README so that it shows the full command.

@thomkeh
I have add python in front of the command ,then another error happened：
Traceback (most recent call last):
File "/root/nifr-master/start_inn.py", line 5, in
main_inn()
File "/root/nifr-master/nifr/optimisation/train_inn.py", line 222, in main_inn
repo = git.Repo(search_parent_directories=True)
File "/root/.local/lib/python3.9/site-packages/git/repo/base.py", line 282, in init
self.working_dir: Optional[PathLike] = self._working_tree_dir or self.common_dir
File "/root/.local/lib/python3.9/site-packages/git/repo/base.py", line 363, in common_dir
raise InvalidGitRepositoryError()
git.exc.InvalidGitRepositoryError

Sorry about that. The code assumed that it was run inside a git repository. I've changed it now such that this isn't required anymore: ca7044c

Sorry about that. The code assumed that it was run inside a git repository. I've changed it now such that this isn't required anymore: ca7044c

@thomkeh
Ok,the above problem is addressed.But there are another two problem exist:
1.When I run the command
python start_inn.py --dataset celeba --levels 3 --level-depth 32 --glow True --reshape-method squeeze --autoencode False --input-noise True --quant-level 5 --use-wandb True --factor-splits 0=0.5 1=0.5 --train-on-recon False --recon-detach False --batch-size 32 --nll-weight 1 --pred-s-weight 1e-2 --zs-frac 0.001 --coupling-channels 512 --super-val True --super-val-freq 10 --val-freq 1 --task-mixing 0.5 --gpu 0 --num-discs 10 --disc-channels 512 --data-split-seed 42 --epochs 30

I got start_inn.py: error: unrecognized arguments: --epochs 30
And I removed it , it happend OSError: Tunnel connection failed: 403 Forbidden

It would be nice if you could provide the download link of celebA and guide how to set the structure of this dataset.By the way ,will I need to modify the data path in anywhere of the code? I notice that it creates a celeba folder in ./data

2.When I run the command
python start_inn.py --dataset cmnist --levels 3 --level-depth 24 --glow True --reshape-method squeeze --autoencode False --input-noise True --quant-level 5 --use-wandb True --factor-splits 0=0.5 1=0.5 --train-on-recon False --recon-detach False --batch-size 256 --test-batch-size 512 --nll-weight 1 --pred-s-weight 1e-2 --zs-frac 0.002 --coupling-channels 512 --super-val True --super-val-freq 5 --val-freq 1 --task-mixing 0 --gpu 0 --num-discs 1 --disc-channels 512 --level-depth 24 --num-discs 3

It will require to enter wandb, when I enter ,got a error
Traceback (most recent call last):
wandb: Waiting for W&B process to finish, PID 8451
File "/root/nifr-master/start_inn.py", line 5, in
main_inn()
File "/root/nifr-master/nifr/optimisation/train_inn.py", line 279, in main_inn
input_shape = get_data_dim(train_loader)
File "/root/nifr-master/nifr/optimisation/utils.py", line 16, in get_data_dim
x = next(iter(data_loader))[0]
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 31, in _pin_memory_loop
data = pin_memory(data)
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 55, in pin_memory
return [pin_memory(sample) for sample in data]
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 55, in
return [pin_memory(sample) for sample in data]
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 47, in pin_memory
return data.pin_memory()
RuntimeError: CUDA error: invalid device context

I replaced --epochs 30 with --iters 76000 in the command: 6a51334 That should work now.

I can't help you with the CUDA error. It depends on how you installed pytorch.

I'm guessing the "403 Forbidden" error is from trying to download the dataset. The dataset is downloaded here:

nifr/nifr/data/celeba.py

Lines 273 to 276 in 6a51334

    
           for (file_id, md5, filename) in self.file_list: 
        
               download_file_from_google_drive( 
        
                   file_id, os.path.join(self.root, self.base_folder), filename, md5 
        
               )

By default it's downloaded to ./data/celeba but you can change the location with the --root flag. For example, with --root ../my_data, the data is expected to be in ../my_data/celeba/.

I replaced --epochs 30 with --iters 76000 in the command: 6a51334 That should work now.

I can't help you with the CUDA error. It depends on how you installed pytorch.

I'm guessing the "403 Forbidden" error is from trying to download the dataset. The dataset is downloaded here:

nifr/nifr/data/celeba.py

Lines 273 to 276 in 6a51334

for (file_id, md5, filename) in self.file_list:

download_file_from_google_drive(

file_id, os.path.join(self.root, self.base_folder), filename, md5

)

By default it's downloaded to ./data/celeba but you can change the location with the --root flag. For example, with --root ../my_data, the data is expected to be in ../my_data/celeba/.

@thomkeh
1.I can't help you with the CUDA error. It depends on how you installed pytorch.
For this question,when change the pin_memory=True to False in ./nifr/optimisation/train_inn.py, it will be OK.

2.For "403 Forbidden",
I found the url in ./nifr/data/celeba.py : http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
what need download and put in the ./data/celeba is
-img_align_celeba.zip （need to unzip then)
-list_attr_celeba.txt
-list_eval_partition.txt
then it will be OK.

3.What are the GPU memory requirements？（16G memory for GPU is OK?）
16G is OK

	for (file_id, md5, filename) in self.file_list:
	download_file_from_google_drive(
	file_id, os.path.join(self.root, self.base_folder), filename, md5
	)

Problem about code running