Issue with AssertionError

Question

Issue with AssertionError

jiangzhangze opened this issue 2 years ago · comments

jiangzhangze commented 2 years ago

I received the following messages when I runpython3 main.py：

` starting trajectory : 0

sed: can't read ./env/sample_8/trajectory_0/processor0/4/U: No such file or directory

starting trajectory : 1

blockMesh already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_0: remove log file 'log.blockMesh' to re-run
setExprBoundaryFields already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_0: remove log file 'log.setExprBoundaryFields' to re-run
decomposePar already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_0: remove log file 'log.decomposePar' to re-run
sed: can't read ./env/sample_8/trajectory_1/processor0/4/U: No such file or directory
blockMesh already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_1: remove log file 'log.blockMesh' to re-run
setExprBoundaryFields already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_1: remove log file 'log.setExprBoundaryFields' to re-run
decomposePar already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_1: remove log file 'log.decomposePar' to re-run
renumberMesh already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_0: remove log file 'log.renumberMesh' to re-run
renumberMesh already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_1: remove log file 'log.renumberMesh' to re-run
Running pimpleFoam (2 processes) on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_0 with image ../../../../../of2106.sif
Running pimpleFoam (2 processes) on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_8/trajectory_1 with image ../../../../../of2106.sif
job : trajectory_0 finished with rc = 0
job : trajectory_1 finished with rc = 0
Traceback (most recent call last):
File "main.py", line 95, in
train_model(value_model,
File "/home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/ppo.py", line 77, in train_model
states, actions, rewards, returns, logpas = fill_buffer(env, sample, n_sensor, gamma, r_1, r_2, r_3, r_4, action_bounds)
File "/home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/reply_buffer.py", line 44, in fill_buffer
assert n_traj > 0
AssertionError`

Some imformations：WSL2;Ubuntu20.04.
It worked at first.It looks like that can't find the file"/env/sample_0/trajectory_0/processor0/4/U": `starting trajectory : 0

sed: can't read ./env/sample_0/trajectory_0/processor0/4/U: No such file or directory

starting trajectory : 1

blockMesh already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_0: remove log file 'log.blockMesh' to re-run
setExprBoundaryFields already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_0: remove log file 'log.setExprBoundaryFields' to re-run
decomposePar already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_0: remove log file 'log.decomposePar' to re-run
blockMesh already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_1: remove log file 'log.blockMesh' to re-run
setExprBoundaryFields already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_1: remove log file 'log.setExprBoundaryFields' to re-run
decomposePar already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_1: remove log file 'log.decomposePar' to re-run
renumberMesh already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_1: remove log file 'log.renumberMesh' to re-run
renumberMesh already run on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_0: remove log file 'log.renumberMesh' to re-run
Running pimpleFoam (2 processes) on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_1 with image ../../../../../of2106.sif
Running pimpleFoam (2 processes) on /home/jzz/ml-cfd-lecture/exercises/drl_control_cylinder/env/sample_0/trajectory_0 with image ../../../../../of2106.sif
job : trajectory_0 finished with rc = 0
job : trajectory_1 finished with rc = 0
MSE of value network larger than tolerance: 11, 25.290144732337467
Iteration 0 completed`

Andre Weiner · Answer 1 · Fri Jul 29 2022 16:51:06 GMT+0800 (China Standard Time)

Hi @jiangzhangze,
it looks like you removed the initial state at 4s. I suggest starting from a clean state - simply re-do the steps to run the exercise as described in the corresponding notebook.
Best, Andre

jiangzhangze · Answer 2 · Sat Jul 30 2022 13:01:22 GMT+0800 (China Standard Time)

Hi @AndreWeiner
I rerun the exercise as you said but the error persisted.When I checked the log.pimplefoam,I found some imformations may be helpful：
`Updating Omega with policy.
terminate called after throwing an instance of 'std::runtime_error'
what(): The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/network.py", line 16, in forward
def forward(self: torch.network.FCCA,
x: Tensor) -> Tensor:
x0 = torch.torch.nn.functional.relu((self.linear_0).forward(x, ), False, )
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
x1 = torch.torch.nn.functional.relu((self.linear_1).forward(x0, ), False, )
_0 = torch.softplus((self.linear_2).forward(x1, ))
File "code/torch/torch/nn/modules/linear.py", line 13, in forward
input: Tensor) -> Tensor:
_0 = torch.torch.nn.functional.linear
return _0(input, self.weight, self.bias, )
~~ <--- HERE
File "code/torch/torch/nn/functional.py", line 11, in linear
weight: Tensor,
bias: Optional[Tensor]=None) -> Tensor:
return torch.linear(input, weight, bias)
~~~~~~~~~~~~ <--- HERE

Traceback of TorchScript, original code (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 96, in forward
def forward(self, input: Tensor) -> Tensor:
return F.linear(input, self.weight, self.bias)
~~~~~~~~ <--- HERE
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 1847, in linear
if has_torch_function_variadic(input, weight):
return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
return torch._C._nn.linear(input, weight, bias)
~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x101 and 100x128)`

Andre Weiner · Answer 3 · Sat Jul 30 2022 18:56:13 GMT+0800 (China Standard Time)

Hi again,
I pushed a bugfix for this issue; see here. After a git pull and re-compiling the code, the exercise should be working fine.
Thanks for reporting this issue!
Best, Andre

jiangzhangze · Answer 4 · Sun Jul 31 2022 15:07:24 GMT+0800 (China Standard Time)

I received following messages after git pull：
$ python3 main.py

starting trajectory : 0

sed: can't read ./env/sample_0/trajectory_0/processor0/4/U: No such file or directory

starting trajectory : 1

./Allrun.singularity: line 3: ../../../../../functions: No such file or directory
sed: can't read ./env/sample_0/trajectory_1/processor0/4/U: No such file or directory
./Allrun.singularity: line 3: ../../../../../functions: No such file or directory
./Allrun.singularity: line 6: setImage: command not found
./Allrun.singularity: line 6: setImage: command not found
./Allrun.singularity: line 12: singularityRun: command not found
./Allrun.singularity: line 12: singularityRun: command not found
./Allrun.singularity: line 16: singularityRun: command not found
./Allrun.singularity: line 16: singularityRun: command not found
./Allrun.singularity: line 19: singularityRun: command not found
./Allrun.singularity: line 19: singularityRun: command not found
./Allrun.singularity: line 20: singularityRunParallel: command not found
./Allrun.singularity: line 20: singularityRunParallel: command not found
./Allrun.singularity: line 21: singularityRunParallel: command not found
job : trajectory_0 finished with rc = 0
./Allrun.singularity: line 21: singularityRunParallel: command not found
job : trajectory_1 finished with rc = 0
Traceback (most recent call last):
File "/home/jzz/ml-cfd-lecture/exercises/main.py", line 95, in
train_model(value_model,
File "/home/jzz/ml-cfd-lecture/exercises/ppo.py", line 77, in train_model
states, actions, rewards, returns, logpas = fill_buffer(env, sample, n_sensor, gamma, r_1, r_2, r_3, r_4, action_bounds)
File "/home/jzz/ml-cfd-lecture/exercises/reply_buffer.py", line 44, in fill_buffer
assert n_traj > 0
AssertionError
this also happening in the new environment.

Andre Weiner · Answer 5 · Sun Jul 31 2022 17:23:05 GMT+0800 (China Standard Time)

Hi @jiangzhangze,

your system is not properly set up. I suggest starting from a clean state:

remove your old copy of the ml-cfd-lecture repository or re-name it
follow the instructions provided in the first exercise to set up your system
create an exercise folder at the top-level of your clone mkdir exercises
follow exercise 11 step by step

Best, Andre

jiangzhangze · Answer 6 · Wed Aug 03 2022 09:23:09 GMT+0800 (China Standard Time)

Hi @AndreWeiner
I solved this problem，thank you very much.

Andre Weiner · Answer 7 · Wed Aug 03 2022 14:44:17 GMT+0800 (China Standard Time)

Perfect, thanks for reporting back!