huggingface / trl

Train transformer language models with reinforcement learning.

http://hf.co/docs/trl

huggingface/trl Issues

Error with SFT of LLaVA-Next
Updated a day ago3
DPO Evalution with WandB triggers a `cannot pickle '_thread.lock' object` failure
Updated a day ago8
Drop `use_cache=False if training_args.gradient_checkpointing`
Closed 2 days ago2
Regarding `setup_chat_format` overwriting existing special tokens
Updated 2 days ago4
TR-DPO : Why is the loss not changing at all, and reward/accuracies and reward/margins always = 0?
Closed 2 days ago1
Low loss but can't get the expected output during inference
Updated 2 days ago2
Will long text be truncated and split into different examples when using packing?
Updated 2 days ago
what's the difference between PPO Trainer and PPOv2 Trainer?
Updated 2 days ago4
Support packing for pretokenized datasets
Updated 2 days ago
scripts/dpo.py : Unable to train custom gpt2 model
Updated 2 days ago
Overflow with padding left warning.
Updated 3 days ago
Possible bug in SFTTrainer
Updated 3 days ago2
evaluation_loop() Bug in KTOTrainer
Closed 3 days ago1
No way to drop samples which are longer than `max_length` in ORPOTrainer?
Updated 4 days ago1
AttributeError: 'NoneType' object has no attribute 'device'
Closed 5 days ago1
OOM Error using PPO Trainer to LoRa-tune 4-bit Llama-3-8B Model
Updated 5 days ago
Support more than one rejected responses
Updated 6 days ago1
BUG: UserWarning: Could not find response key in the following instance:
Updated 8 days ago1
Support for Iterative DPO
Updated 8 days ago1
How to use `predict` function in `DPOTrainer`
Updated 8 days ago
Not find sft.py
Updated 8 days ago
The meaning of "--dataset_name"?
Updated 8 days ago
Phi-3 SFT training and padding tokens
Updated 8 days ago2
running rloo.py got. RuntimeError: CUDA error: device-side assert triggered
Updated 10 days ago1
Supports of SFTTrainer / PPOTrainer / DPOTrainer for LLaVA-alike model
Closed 11 days ago7
How to use DoRA with ORPO
Closed 12 days ago
Disparity in the generate function
Updated 12 days ago
Update `TRL_USE_RICH` flag for CLI
Closed 12 days ago1
Support for more trainers in CLI
Updated 13 days ago
[Possible Bug] `RewardTrainer`'s `seed` makes a difference but `data_seed` has no impact?
Updated 14 days ago
The metrics in wandb is abnormal
Closed 14 days ago5
about `mixed_precision` in acclerate config
Updated 15 days ago
In PPOv2Config and RLOOConfig, the base_model parameter doesn't seem to be used, why does it exist?
Closed 15 days ago1
Vision Language Models collator and batch size
Updated 16 days ago
Lora seems to be invalid when using vsft_llava.py
Updated 16 days ago4
Can bert be used for dpo training?
Updated 16 days ago1
[Feature] Add DiscoPOP algorithm
Updated 17 days ago
Optimizing an LLM Using DPO: nan Loss Values During Evaluation
Updated 20 days ago
Clarification on reward/value heads in PPOV2
Updated 22 days ago3
Conflict in start index under `batched_forward_pass`
Updated 23 days ago
DDPO trained model error when used to generate images
Updated 23 days ago1
The DPO 'grad_norm': 0.0,
Updated 23 days ago
Getting "KeyError: None" when passing conversational dataset
Closed 23 days ago1
PPOv2 trainer, the wandb log is unnormal
Closed 25 days ago8
Want to use zero3 to train KTO and met error
Updated 25 days ago
Neftune is applied twice; in trl and transformers BOTH!
Updated 25 days ago1
What is the difference between PPOv2Trainer and PPOTrainer?
Closed a month ago1
SFTTrainer device error even though it doesn't take device as an argument
Updated a month ago
Using IterableDataset crashed the SFTTrainer
Updated a month ago
DataCollatorForCompletionOnlyLM does not work with FSDP
Updated a month ago