NVIDIA / vid2vid

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

train on my owndatasets

birdflyto opened this issue · comments

hello,i have finished the demo in the examples,when i trained on my own datasets which are made by the noted, i got the following errors ,and i have try several methods to solve it,but failed.
I will be great appreciate it if you can give some guides.THANK YOU!!!
CUDA_VISIBLE_DEVICES=1 python train.py --name label2city_256 --label_nc 1 --loadSize 256 --use_instance --fg --n_downsample_G 2 --num_D 1 --max_frames_per_gpu 6 --n_frames_total 6
------------ Options -------------
TTUR: False
add_face_disc: False
basic_point_only: False
batchSize: 1
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
dataroot: datasets/Cityscapes/
dataset_mode: temporal
debug: False
densepose_only: False
display_freq: 100
display_id: 0
display_winsize: 512
feat_num: 3
fg: True
fg_labels: [26]
fineSize: 512
fp16: False
gan_mode: ls
gpu_ids: [0]
input_nc: 3
isTrain: True
label_feat: False
label_nc: 1
lambda_F: 10.0
lambda_T: 10.0
lambda_feat: 10.0
loadSize: 256
load_features: False
load_pretrain:
local_rank: 0
lr: 0.0002
max_dataset_size: inf
max_frames_backpropagate: 1
max_frames_per_gpu: 6
max_t_step: 1
model: vid2vid
nThreads: 2
n_blocks: 9
n_blocks_local: 3
n_downsample_E: 3
n_downsample_G: 2
n_frames_D: 3
n_frames_G: 3
n_frames_total: 6
n_gpus_gen: 1
n_layers_D: 3
n_local_enhancers: 1
n_scales_spatial: 1
n_scales_temporal: 2
name: label2city_256
ndf: 64
nef: 32
netE: simple
netG: composite
ngf: 128
niter: 10
niter_decay: 10
niter_fix_global: 0
niter_step: 5
no_canny_edge: False
no_dist_map: False
no_first_img: False
no_flip: False
no_flow: False
no_ganFeat: False
no_html: False
no_vgg: False
norm: batch
num_D: 1
openpose_only: False
output_nc: 3
phase: train
pool_size: 1
print_freq: 100
random_drop_prob: 0.05
random_scale_points: False
remove_face_labels: False
resize_or_crop: scaleWidth
save_epoch_freq: 1
save_latest_freq: 1000
serial_batches: False
sparse_D: False
tf_log: False
use_instance: True
use_single_G: False
which_epoch: latest
-------------- End ----------------
CustomDatasetDataLoader
dataset [TemporalDataset] was created
#training videos = 1
vid2vid
---------- Networks initialized -------------

---------- Networks initialized -------------

create web directory ./checkpoints/label2city_256/web...

(4,)

!!! (1, 8, 1, 128, 256)
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [448,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [449,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [450,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [451,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [452,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [453,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [454,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [455,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [456,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [457,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [458,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [459,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [460,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [461,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [462,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [463,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [464,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [465,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [466,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [467,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [468,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [469,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [470,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [471,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [472,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [473,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [474,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [475,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [476,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [477,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [478,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [479,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/generated/../THCReduceAll.cuh line=317 error=59 : device-side assert triggered
label Traceback (most recent call last):
File "train.py", line 150, in
train()
File "train.py", line 56, in train
fake_B, fake_B_raw, flow, weight, real_A, real_Bp, fake_B_last = modelG(input_A, input_B, inst_A, fake_B_prev_last)
File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/public/home/lcc-dx07/perl5/vid2vid-master/models/models.py", line 38, in forward
outputs = self.model(*inputs, **kwargs, dummy_bs=self.pad_bs)
File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/public/home/lcc-dx07/perl5/vid2vid-master/models/vid2vid_model_G.py", line 125, in forward
real_A_all, real_B_all, _ = self.encode_input(input_A, input_B, inst_A)
File "/public/home/lcc-dx07/perl5/vid2vid-master/models/vid2vid_model_G.py", line 97, in encode_input
print('label',input_label)
File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/tensor.py", line 66, in repr
return torch._tensor_str._str(self)
File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/_tensor_str.py", line 277, in _str
tensor_str = _tensor_str(self, indent)
File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/_tensor_str.py", line 195, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/_tensor_str.py", line 84, in init
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/generated/../THCReduceAll.cuh:317
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered (insert_events at /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCCachingAllocator.cpp:470)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f2e869e6cc5 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0x135cb20 (0x7f2e8a4d9b20 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #2: at::TensorImpl::release_resources() + 0x50 (0x7f2e87041f90 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x2ad98b (0x7f2e8132998b in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #4: torch::autograd::Variable::Impl::release_resources() + 0x17 (0x7f2e815a0127 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #5: + 0x121b2b (0x7f2ec717cb2b in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x31b8df (0x7f2ec73768df in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #7: + 0x31b921 (0x7f2ec7376921 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #24: __libc_start_main + 0xf5 (0x7f2ee0cefc05 in /lib64/libc.so.6)

Aborted (core dumped)

Were you able to resolve this error? I am running into the same error while using a custom dataset