davidtvs / pytorch-lr-finder

A learning rate range test implementation in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Help with lr-finder working with transformers?

afogarty85 opened this issue · comments

I am in need of a tool like this for a particular problem that is very sensitive to the LR. I am, however, unable to get this package to work with any transformer model unfortunately.

My error is as below and I am wondering if you have any insight!

from torch_lr_finder import LRFinder
import torch.optim as optim
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base", num_labels=3).cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-decc9b6c423b> in <module>
----> 1 lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    284                 train_iter,
    285                 accumulation_steps,
--> 286                 non_blocking_transfer=non_blocking_transfer,
    287             )
    288             if val_loader:

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    342             # Forward pass
    343             outputs = self.model(inputs)
--> 344             loss = self.criterion(outputs, labels)
    345 
    346             # Loss should be averaged in each step

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    724             result = self._slow_forward(*input, **kwargs)
    725         else:
--> 726             result = self.forward(*input, **kwargs)
    727         for hook in itertools.chain(
    728                 _global_forward_hooks.values(),

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
    946     def forward(self, input: Tensor, target: Tensor) -> Tensor:
    947         return F.cross_entropy(input, target, weight=self.weight,
--> 948                                ignore_index=self.ignore_index, reduction=self.reduction)
    949 
    950 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2420     if size_average is not None or reduce is not None:
   2421         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   2423 
   2424 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in log_softmax(input, dim, _stacklevel, dtype)
   1589         dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
   1590     if dtype is None:
-> 1591         ret = input.log_softmax(dim)
   1592     else:
   1593         ret = input.log_softmax(dim, dtype=dtype)

AttributeError: 'tuple' object has no attribute 'log_softmax'

Hi @afogarty85.

It seems that XLMRobertaForSequenceClassification.forward() returns a tuple object (in v3.0.2), and that makes it failed to compute loss in LRFinder._train_batch().

Currently, it takes only single item returned from model.forward() in LRFinder._train_batch(). Therefore, there are 2 solutions for this case:

  1. Make a wrapper for loss function that takes 2nd item from XLMRobertaForSequenceClassification.forward().

    class LossWrapper(nn.module):
        def __init__(self, loss_fn):
            self.loss_fn = loss_fn
    
        def forward(self, outputs, labels):
            # In current case, `labels` will be passed into
            # `XLMRobertaForSequenceClassification.forward()`, and that makes
            # it return at least 2 items: `(loss,), logits`. But what we need
            # is `logits` only.
            # See also the docstring of `XLMRobertaForSequenceClassification.forward()`
            logits = outputs[1]
            return self.loss_fn(logits, labels)
    
    # Then, just replace the original criterion with this wrapper
    criterion = LossWrapper(nn.CrossEntropyLoss())
    lr_finder = LRFinder(model, optimizer, criterion, device="cuda")
  2. Since loss will be computed automatically if labels is passed into XLMRobertaForSequenceClassification.forward(), previous solution leads to a computing waste (loss will be computed twice). If you care about this issue, you can subclass LRFinder and rewrite _train_batch() that does not compute loss again.

    class MyLRFinder(LRFinder):
        def _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer=True):
            # L344-L345, original code:
            # outputs = self.model(inputs)
            # loss = self.criterion(outputs, labels)
    
            # Just change those lines above to the following one:
            loss, logits = self.model(inputs)
    
    # Then, use the new LRFinder
    criterion = nn.CrossEntropyLoss()
    lr_finder = MyLRFinder(model, optimizer, criterion, device="cuda")

Besides, note that the output of XLMRobertaForSequenceClassification.forward() has been changed to SequenceClassifierOutput in the latest version of transformer library, you might need to do further modification after updating it.

Feel free to let me know if there is any further problem!

Thank you so much for your quick response and attention!

I cannot get (1) to work but I have not tried editing the source code for (2).

In terms of (1), the central problem I have now is calculating the loss when the logits are shape: [56, 3] and then calculating the loss is not possible, returning the error: multi-target not supported. Labels are shape [56, 1].

Its unclear why this is not working: nn.CrossEntropyLoss(logits, labels.flatten()), nor this: predicted = torch.argmax(logits, dim=1)

I also made some slight tweaks to get it a bit closer to working:

class LossWrapper(nn.Module):
    def __init__(self, loss_fn):
        super().__init__()
        self.loss_fn = loss_fn
    def forward(self, outputs, labels):
        # In current case, `labels` will be passed into
        # `XLMRobertaForSequenceClassification.forward()`, and that makes
        # it return at least 2 items: `(loss,), logits`. But what we need
        # is `logits` only.
        # See also the docstring of `XLMRobertaForSequenceClassification.forward()`
        logits = outputs[0]  # logits only
        return self.loss_fn(logits, labels)

Yeah, you are right. I missed the calling of super() to inherit its parent class.

In terms of (1), the central problem I have now is calculating the loss when the logits are shape: [56, 3] and then calculating the loss is not possible, returning the error: multi-target not supported. Labels are shape [56, 1].

Dimension of labels for loss function should be 1 here, that is, logits.shape: [56, 3] and labels.shape: [56] where 56 is batch size. Otherwise, you will get that RuntimeError: multi-target not supported.... You can pass labels.view(-1) into loss function to see whether it works or not. That is:

class LossWrapper(nn.Module):
    def __init__(self, loss_fn):
        super().__init__()
        self.loss_fn = loss_fn

    def forward(self, outputs, labels):
        logits = outputs[1]  # logits only
        return self.loss_fn(logits, labels.view(-1))

See also here.

However, it's probably incorrect to select the first value of outputs (logits = outputs[0]) to compute loss. The reason is also described in the comment of LossWrapper.forward(), the first value should be loss instead of logits. So, the code you modified might work incorrect just without error occurring.

And the reason why nn.CrossEntropyLoss(logits, labels.flatten()) does not work is that nn.CrossEntropyLoss is a class of loss function, you should use it like this:

loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(logits, labels.flatten())    # `labels.flatten()` is equivalent to `labels.view(-1)` in this case

UPDATE: you can check that whether outputs[0] is a loss value or not by checking outputs[0].grad_fn just right after outputs = model(...). If it's something like <NllLossBackward object at ...>, it means outputs[0] was just passed into CrossEntropyLoss() in model.forward().

Good Evening,

I am still having some struggles. I am not entirely sure what is not working properly.

With a toy example, the model outputs loss, logits correctly:

import torch
import torch.nn as nn
from torch_lr_finder import LRFinder
import torch.optim as optim
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
from torch.utils.data import TensorDataset, DataLoader

model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base", num_labels=3).cuda()
tokenizer = XLMRobertaTokenizer.from_pretrained('xlm-roberta-base')

input_ids = tokenizer("Hello, my dog is cute", return_tensors="pt")
input_label = torch.tensor([1])

with torch.no_grad():
    loss, logits = model(input_ids=input_ids['input_ids'].cuda(),
                     attention_mask=input_ids['attention_mask'].cuda(),
                     labels=input_label.cuda())

By running the supplied class, outputs[1] cannot be found.

class LossWrapper(nn.Module):
    def __init__(self, loss_fn):
        super().__init__()
        self.loss_fn = loss_fn

    def forward(self, outputs, labels):
        logits = outputs[1]  # logits only
        return self.loss_fn(logits, labels.view(-1))

If I change the class to outputs[0], then I believe outputs[0] is actually the logits now, given the below error, as its size reflects the new batch size of 24.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-decc9b6c423b> in <module>
----> 1 lr_finder.range_test(train_dataloader, val_loader=valid_dataloader, end_lr=1, num_iter=100, step_mode="linear")

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    284                 train_iter,
    285                 accumulation_steps,
--> 286                 non_blocking_transfer=non_blocking_transfer,
    287             )
    288             if val_loader:

~\Anaconda3\envs\my_ml\lib\site-packages\torch_lr_finder\lr_finder.py in _train_batch(self, train_iter, accumulation_steps, non_blocking_transfer)
    342             # Forward pass
    343             outputs = self.model(inputs)
--> 344             loss = self.criterion(outputs, labels)
    345 
    346             # Loss should be averaged in each step

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    724             result = self._slow_forward(*input, **kwargs)
    725         else:
--> 726             result = self.forward(*input, **kwargs)
    727         for hook in itertools.chain(
    728                 _global_forward_hooks.values(),

<ipython-input-9-ea4677930fd0> in forward(self, outputs, labels)
      6     def forward(self, outputs, labels):
      7         logits = outputs[0]
----> 8         return self.loss_fn(logits, labels.view(-1))

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    724             result = self._slow_forward(*input, **kwargs)
    725         else:
--> 726             result = self.forward(*input, **kwargs)
    727         for hook in itertools.chain(
    728                 _global_forward_hooks.values(),

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\modules\loss.py in forward(self, input, target)
    946     def forward(self, input: Tensor, target: Tensor) -> Tensor:
    947         return F.cross_entropy(input, target, weight=self.weight,
--> 948                                ignore_index=self.ignore_index, reduction=self.reduction)
    949 
    950 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2420     if size_average is not None or reduce is not None:
   2421         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   2423 
   2424 

~\Anaconda3\envs\my_ml\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2214     if input.size(0) != target.size(0):
   2215         raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
-> 2216                          .format(input.size(0), target.size(0)))
   2217     if dim == 2:
   2218         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

ValueError: Expected input batch_size (24) to match target batch_size (2280).
                                                                

I am sorry for all the time you have spent, but I appreciate it and it will help me tackle this process for other transformer models in the future.

Thanks for that example. And no need to be sorry, your feedback helps us improve this package.

If I change the class to outputs[0], then I believe outputs[0] is actually the logits now, given the below error, as its size reflects the new batch size of 24.

Oh, that's my bad. I forgot that every batch of data will be unpacked into inputs and labels before passing into model. Hence that labels is not passed into BERT model, and that's why there is only one item in the returned value of model.forward(). So that you are correct, the first item is logits indeed.

And due to this reason, loss won't be computed during model.forward(). So that there is no issue about computing waste which is metioned in my previous comment.

However, with current information, I cannot figure out the cause of error in your latest comment ValueError: Expected input batch_size (24) to match target batch_size (2280). Maybe you have to use pdb.set_trace() to trace the code before executing lr_finder.py@L344, and try to figure out what the input argument outputs and labels actually is.

I wrote an example for utilizing LRFinder with transformer model, maybe you can find something helpful from it: https://colab.research.google.com/drive/1OZ7EWnGopPT8RNLxXVJ_K2EmgHMmTpN4?usp=sharing

Thanks for the continued help. I have been playing with your colab example which of course works. I am still trying to figure out why, exactly, it does not work with my data. I won't have some time to play around until next week unfortunately. I would like to get this working so I can generalize it across multiple models in the way that I am processing my data.

That's totally fine. Take your time.

Hello :)
@NaleRaphael

I have been testing you colab and I found it a little bit strange how you say that logits should be outputs[0] instead of outputs[1].
when you run :

for batch in train_loader:
    outputs = bert_model_wrapper(*batch)
    print(outputs) #
    break

The output is something like this SequenceClassifierOutput(loss=tensor(2.3758, device='cuda:0', grad_fn=<NllLossBackward0>), logits=tensor([[-0.0108, 0.2426, -0.1928, -0.1228, -0.0440, 0.1848, -0.2018, 0.0837, 0.0681, -0.0016] From here we can see that the outputs[1] is our logits

when I try it with [0] I got abatch_text_or_text_pairs has to be a list (got <class 'tuple'>)!
when I run it with [1] I got a complet different error tuple index out of range with lr_finder.range_test(train_loader, val_loader=valid_loader, start_lr=1e-5, end_lr=10, num_iter=100, step_mode='linear')
Can you explain it to meet please :) ?
thanks!

Hi @ma-batita !

In that colab notebook, I was using huggingface transformer v3.0.2. You can find that the returned value of XLMRobertaForSequenceClassification.forward() would be:

  1. if labels is not given, the first value would be logits
  2. if labels is given, the first value would be loss
# source: https://github.com/huggingface/transformers/blob/v3.0.2/src/transformers/modeling_roberta.py#L349-L360
        # ...
        outputs = (logits,) + outputs[2:]    # case 1: `labels` is not given, the first element of tuple `outputs` is `logits`.
        if labels is not None:
            if self.num_labels == 1:
                #  We are doing regression
                loss_fct = MSELoss()
                loss = loss_fct(logits.view(-1), labels.view(-1))
            else:
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            outputs = (loss,) + outputs    # case 2: `labels` is given, now the tuple `outputs` is prepended with `loss`.

        return outputs  # (loss), logits, (hidden_states), (attentions)

That's why I mentioned "... the first item is logits indeed" in my previous comment.

From the message you provided, output has been wrapped with a data class SequenceClassifierOutput. So that it seems you are running with a version later than v3.0.2. Though huggingface transformers has already bumped to v4.21.3, it still retrains a similar behavior:

# source: https://github.com/huggingface/transformers/blob/v4.21.3/src/transformers/models/roberta/modeling_roberta.py#L1243-L1252
# ...
        # NOTE: code above are written to calculate `loss` if `labels` is given, but we omitted them here.
        # If `labels` is not given, `loss` will retain its initial value: None.
        if not return_dict:
            output = (logits,) + outputs[2:]
            # Below is the condition to determine whether the first element of returned value is `loss` or `logits`
            return ((loss,) + output) if loss is not None else output

        # This should be the result you get (`return_dict` is True)
        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

So, the key is still the same: "is labels passed to model.forward()?"
As it's mentioned in my previous comment:

... I forgot that every batch of data will be unpacked into inputs and labels before passing into model. Hence that labels is not passed into BERT model, and that's why there is only one item in the returned value of model.forward()...

This is the reason why I directly select the first element in LossWrapper written in the colab notebook:

class LossWrapper(nn.Module):
    def __init__(self, loss_fn):
        super().__init__()
        self.loss_fn = loss_fn

    def forward(self, outputs, labels):
        # Output of BERT model is a tuple even if the argument `labels` is not
        # passed in, so that we should unpack it here.
        # When `labels` is not passed, outputs: (logits,)
        # When `labels` is passed, outputs: (loss, logits,)
        logits = outputs[0]
        return self.loss_fn(logits, labels)

Regarding other questions you mentioned:

when I try it with [0] I got abatch_text_or_text_pairs has to be a list (got <class 'tuple'>)!
when I run it with [1] I got a complet different error tuple index out of range with lr_finder.range_test(train_loader, val_loader=valid_loader, start_lr=1e-5, end_lr=10, num_iter=100, step_mode='linear')

Could you elaborate what you were trying to do? Or maybe you can try inserting a breakpoint (import pdb; pdb.set_trace()) before the line of code where exception was raised and re-run again, and take a look about variables in that scope.
Hope that could help you figure out the problem quickly.

Anyway, please feel free to let me know if you still have questions.

Hello again :) @NaleRaphael
Thanks for your quick reply I reply appreciate it!!

From the message you provided, output has been wrapped with a data class SequenceClassifierOutput. So that it seems you are running with a version later than v3.0.2. Though huggingface transformers has already bumped to v4.21.3, it still retrains a similar behaviour:

Yes, ended I am using the later version of huggingface and I am demanding since the output has a different form ever since why not adapt LRFinder to this major change? one can make the criterion as an option since it is already wrapped into the model (for the classification problems). Is this can be an option or it affects the global function of LRFinder ?


Now let's say that I am using the right version of huggingface (v3.0.2.)
I manage to understand where this error came from,

batch_text_or_text_pairs has to be a list (got <class 'tuple'>)!

I get it because I am doing from transformers import AutoTokenizer, AutoModelForSequenceClassificationand if I do from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification I dont get it.
Do you have any idea why please ?

you asked what I am trying to do?
I have a classification problem with a spanish model and I want to find a good LR to start my training.
when I tried :

import pdb
for batch in train_loader:
  pdb.set_trace()
  outputs = bert_model_wrapper(*batch)
  print(outputs) # [1]
  break

I get :

> <ipython-input-20-217a24934843>(4)<module>()
-> outputs = bert_model_wrapper(*batch)
(Pdb) outputs
SequenceClassifierOutput(loss=tensor(2.1979, device='cuda:0', grad_fn=<NllLossBackward0>), logits=tensor([[ 0.3399, -0.0038,  0.0829,  0.0488,  0.3425,  0.2920, -0.0111,  0.0885,
          0.0715, -0.1070],
        [ 0.3414, -0.0055,  0.0908,  0.0377,  0.3433,  0.2716, -0.0141,  0.0839,
          0.0717, -0.1170],
        [ 0.3354, -0.0030,  0.0834,  0.0354,  0.3348,  0.2739, -0.0159,  0.0951,
          0.0848, -0.1094],
        [ 0.3385,  0.0042,  0.0952,  0.0335,  0.3372,  0.2679, -0.0228,  0.0871,
          0.0837, -0.1097]], device='cuda:0', grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

Hi @ma-batita

...why not adapt LRFinder to this major change? one can make the criterion as an option since it is already wrapped into the model (for the classification problems). Is this can be an option or it affects the global function of LRFinder ?

Thanks for bringing up this question. It's indeed a question for library design. According to my understanding, PyTorch is flexible, and there is no a hard rule made for restricting users how to implement a model.forward(). Therefore, there would be various inputs & outputs of model.forward() except simply tensors. In this situation, it's hard to change the design of LRFinder just to make it meet the specifications of some libraries. (otherwise, it could easily break the compatibility to others)

We actually had some questions and PRs purposed regarding model inputs/output handling before, and we found that the approach proposed in PR #37 seems a proper way to go as it could provide a better flexibility for users to get their models work with LRFinder without affecting existing codebase (for both users' and LRFinder's). Since LRFinder is a tool for finding learning rate, the ideal situation is that you should be able to dispose all code related to LRFinder after finding a good learning rate, then continue working on your original codebase without changing a single character in it.

Therefore, I think the idea of using wrapper classes purposed by David is also a nice solution to deal with these various situations. Though it might increase a little bit difficulty for unconventional models and training pipelines, it could almost make sure that your existing model and pipeline would still work after removing code related to LRFinder.

So, if you want to deal with the returned object of dataclass from transformers, using the similar approach written in that colab notebook should still work.


The error message

batch_text_or_text_pairs has to be a list (got <class 'tuple'>)

should be raised by tokenizer. As it described, inputs should be a list of text. Maybe you should check what inputs are passed into the model. If you can provide further code snippet to show how you use the model and traceback message of error, maybe I can help you figure out the problem. Otherwise, I guess it's not a direct problem related to LRFinder.

Regarding using AutoTokenizer and AutoModelForSequenceClassification, this should not be a problem. Because you can get the same model and tokenizer by either this configuration:

xlm_roberta_config = XLMRobertaConfig.get_config_dict('xlm-roberta-base')[0]
xlm_roberta_model = XLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-base', num_labels=2).cuda()
xlm_roberta_tokenizer = XLMRobertaTokenizer.from_pretrained('xlm-roberta-base')

or this one:

config = AutoConfig.from_pretrained("xlm-roberta-base")
xlm_roberta_config = config.get_config_dict('xlm-roberta-base')[0]
xlm_roberta_model = AutoModelForSequenceClassification.from_config(config).cuda()
xlm_roberta_tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')

I have run the same notebook mentioned above using these 2 configurations and they all works.

Hello @NaleRaphael
Thanks for you detailed explanation about the modification of LRFinder. I understood it from the beginning that this specific modification will cause problems in other situations. It was a matter of confirmation :)
Thanks anyway ;)


About the error it is that later version of transformers that caused it :)
When I use transformers==3.0.2 I dont get it!
But I still got a problem and I think it is related with my Data! It is really weird as I tried to make my data looks like yours in the notebook but still, I got :

CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

even when I tried max_length=128 I still got it!

Ps: I have a problem with 1O different classes and I am not sure how I am going to fit them in the AutoModelForSequenceClassification!

Hi @ma-batita

Sure, no worries. :)

Regarding the issue you ran into, it's recommended to run the model on CPU first. Some error messages might not be shown explicitly while running on GPU. Also, it's also recommended to make sure your code work well without considering anything related LRFinder first. Once you did it, then you can start adding LRFinder into it. This could help you figure out the actual cause.

As for the last question, you need to set num_labels in config to make AutoModelForSequenceClassification work with difference number of classes, e.g.,

# see also: https://github.com/huggingface/transformers/blob/v4.21.3/src/transformers/models/roberta/modeling_roberta.py#L1199-L1202
config = AutoConfig.from_pretrained("xlm-roberta-base")
config.num_labels = 10
model = AutoModelForSequenceClassification.from_config(config)

Hi @NaleRaphael,

Regarding the issue you ran into, it's recommended to run the model on CPU first. Some error messages might not be shown explicitly while running on GPU. Also, it's also recommended to make sure your code work well without considering anything related LRFinder first. Once you did it, then you can start adding LRFinder into it. This could help you figure out the actual cause.

I did exactly what you said here. It is running smoothly now but when I change the model it doesn't. The exemple below :

from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

name = 'CenIA/albert-base-spanish'

config = AutoConfig.from_pretrained(name)
model_config = config.get_config_dict(name)[0]
model = AutoModelForSequenceClassification.from_config(config).cuda()
tokenizer = AutoTokenizer.from_pretrained(name)

give me the error : Can't load config for 'CenIA/albert-base-spanish'. Make sure that:

  • 'CenIA/albert-base-spanish' is a correct model identifier listed on 'https://huggingface.co/models'

  • or 'CenIA/albert-base-spanish' is the correct path to a directory containing a config.json file

I think it is something related to transformers==3.0.2. I couldn't fix it even when I downloaded the model and all it files from HuggingFace. Can you tell me please how can we fix this ?

As for the last question, you need to set num_labels in config to make AutoModelForSequenceClassification work with difference number of classes,

Thanks for the clarification. Sorry i didn't work with old version fo transformers that is why! 😅

Closing due to inactivity