LinWeizheDragon / Retrieval-Augmented-Visual-Question-Answering

[ERROR] - main : Uncaught exception: <class 'lightning_lite.utilities.exceptions.MisconfigurationException'> --> The provided lr scheduler LambdaLR doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.
Traceback (most recent call last):
File "/home/zzu_zxw/zjl_data/Retrieval-Augmented-Visual-Question-Answering/src/main.py", line 341, in
main(config)
File "/home/zzu_zxw/zjl_data/Retrieval-Augmented-Visual-Question-Answering/src/main.py", line 162, in main
trainer.fit(
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 582, in fit
call._call_and_handle_interrupt(
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 624, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1042, in _run
self.strategy.setup(self)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 182, in setup
self.setup_optimizers(trainer)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 142, in setup_optimizers
self.optimizers, self.lr_scheduler_configs, self.optimizer_frequencies = _init_optimizers_and_lr_schedulers(
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 195, in _init_optimizers_and_lr_schedulers
_validate_scheduler_api(lr_scheduler_configs, model)
File "/home/zzu_zxw/miniconda3/envs/RAVQA/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 352, in _validate_scheduler_api
raise MisconfigurationException(
lightning_lite.utilities.exceptions.MisconfigurationException: The provided lr scheduler LambdaLR doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.
wandb: Network error (TransientError), entering retry loop.
wandb: 🚀 View run OKVQA_DPR_FullCorpus at: https://wandb.ai/ravqa/VQA_publication/runs/7bnwi9tl
wandb: Synced 3 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230912_100635-7bnwi9tl/logs

请问pytorch_lightning的版本是多少呢？我的是1.9.0

当时没想到lightning版本更新之后不向下兼容……当时用的应该是1.7.x的版本你慢慢往下降应该能找到能用的

另外建议用scheduler=linear 或者 scheduler=none

lightning==1.7.2也不太行。一样的错误

我感觉是你代码用的有问题你试试用scheduler=none 看上去是你用了LambdaLR

DPR.jsonnet配置文件显示的好像就是 scheduler=none
"train": {
"type": "DPRExecutor",
"epochs":train_epochs,
"batch_size":train_batch_size,
"lr": lr,
"adam_epsilon": adam_epsilon,
"load_epoch": -1,
"load_model_path": "",
"load_best_model": 0,
"save_interval":save_interval,
"scheduler": "none",
"additional": {
"gradient_accumulation_steps": gradient_accumulation_steps,
"warmup_steps": warmup_steps,
"gradient_clipping": gradient_clipping,
"save_top_k_metric": "test/recall_at_5",
"plugins": [],
}
},

lightning==1.6.0,1.7.5也不太行。

应该是lightning的问题 scheduler用None的话不应该触发任何有关LambdaLR的内容的

Retrieval-Augmented-Visual-Question-Answering/src/trainers/dpr_executor.py

Line 92 in 6ee1afe

else:

你仔细检查一下 print 一下configure_optimizers() 那个函数的内容你看看启动训练的时候是不是传入了什么LR scheduler相关的东西到pytorch lightning里面因为你这里提示了The provided lr scheduler LambdaLR doesn't follow PyTorch's LRScheduler API.

我上传之前都检查过所有代码都是可以正常运行的

huggingface/transformers#22210

trainers/dpr_executor.py文件中，因为self.config.train.scheduler==none,所以进入到了最后一个else
if self.config.train.scheduler == 'linear':
from transformers import get_linear_schedule_with_warmup
# Using Linear scheduler
self.scheduler = get_linear_schedule_with_warmup(
self.optimizer,
num_warmup_steps=self.config.train.additional.warmup_steps,
num_training_steps=self.trainer.estimated_stepping_batches,
last_epoch=self.global_step,
)
elif self.config.train.scheduler == 'cosine':
t_total = self.config.train.epochs
self.scheduler = optim.lr_scheduler.CosineAnnealingLR(self.optimizer,
t_total, eta_min=1e-5, last_epoch=-1, verbose=False)
else:
from transformers import get_constant_schedule_with_warmup
# Using constant scheduler
self.scheduler = get_constant_schedule_with_warmup(
self.optimizer,
num_warmup_steps=self.config.train.additional.warmup_steps,
last_epoch=self.global_step,
)
执行完后：self.scheduler为<torch.optim.lr_scheduler.LambdaLR object at 0x7f898f1daaf0>

我需要把transformer和lighting都升级到最新版吗

你应该先尝试把Pytorch降到1.12左右的版本然后再试试这应该是兼容性的问题
你也可以全部升级到最新版，但是Pytorch lightning有一系列改动你需要按照提示adapt到新的版本，比较麻烦一些

太感谢了，我把torch从2.0.1降到了1.12.1，问题解决了