CUDNN_STATUS_INTERNAL_ERROR with batch size 40 on 8x2080RTX
vadimkantorov opened this issue · comments
@ajabri Have you encountered this stack trace? pytorch/pytorch#51382
with batch size 35 everything works, with batch size 40 even the first iteration breaks, it uses 7Gb memory out of 11gb just before the exception
This is surprising I cannot fit 2x batch size on 4xtimes of GPUs... What was your 2080RTX memory size? Also 11Gb or a larger one?
Okay, it seems that my cuda device 7 is faulty :( pytorch/pytorch#51382 (comment)