microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Home Page:https://aka.ms/GeneralAI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LayoutLMv3 document layout train error

yuumiin opened this issue · comments

Describe
Model I am using LayoutLMv3

after running object_detection/train_net.py
cuda error RuntimeError: CUDA out of memory

Error occurred after adjusting IMS_PER_BATCH in cascade_layoutlmv3.yaml
TypeError: save_for_backward can only save variables, but argument 6 is of type bool

How to solve it???

It is hard to locate the cause of errors and debug without error stack traces.
Have you set IMS_PER_BATCH to a multiple of your GPU size? For example, if you are using 8 GPUs, then you could set IMS_PER_BATCH to 8, 16, etc.

@HYPJUDY
image
I also encountered this problem,How can I solve it?

Same here, interesting..
I am using PyTorch 1.8.0 and I suspect there can be issues with this version.
This problem happens inside gradient checkpointing function in Torch.
In PyTorch 1.8.0, the gradient checkpointing function saves everything in the input for backward, which must be torch.tensors. This can be problematic if there are non-tensor input as function input (layer_module.forward in this case). Below is the source code of CheckPointFunction in Torch 1.8.0:

class CheckpointFunction(torch.autograd.Function):

    @staticmethod
    def forward(ctx, run_function, preserve_rng_state, *args):
        check_backward_validity(args)
        ctx.run_function = run_function
        ctx.preserve_rng_state = preserve_rng_state
        ctx.had_autocast_in_fwd = torch.is_autocast_enabled()
        if preserve_rng_state:
            ctx.fwd_cpu_state = torch.get_rng_state()
            # Don't eagerly initialize the cuda context by accident.
            # (If the user intends that the context is initialized later, within their
            # run_function, we SHOULD actually stash the cuda state here.  Unfortunately,
            # we have no way to anticipate this will happen before we run the function.)
            ctx.had_cuda_in_fwd = False
            if torch.cuda._initialized:
                ctx.had_cuda_in_fwd = True
                ctx.fwd_gpu_devices, ctx.fwd_gpu_states = get_device_states(*args)
        ctx.save_for_backward(*args)
        with torch.no_grad():
            outputs = run_function(*args)
        return outputs

However, this issue is fixed in PyTorch 1.10.0, as also explained in this blog. The difference lies in L27-L41.

So update your PyTorch version should fix the issue.

Windows: set 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512'

Linux: export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512'