timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

Home Page:https://arxiv.org/abs/2001.07676

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training Time Issue

imethanlee opened this issue · comments

Hi,

What is the expected time to train PET model on yelp_full dataset (with default arguments)? I started the training the day before yesterday with a RTX 3090 GPU and it is still running.

Thanks.

I don't know how efficient RTX 3090's are, but with a single Nvidia Geforce 1080Ti, training PET (not iPET) with the default parameters is a matter of a few hours. In case you haven't fixed the issue yourself yet, could you provide me with the exact command that you've used to train the model? Also, did you check (e.g., with nvidia-smi) whether the GPU is actually used?

Hi @timoschick,

I am having the same issue here. I started the training on a RTX 3090 yesterday and it is still running. The command I am using is as follows:

python pet/cli.py \
    --method pet \
    --pattern_ids 0 3 5 \
    --data_dir ${DATA_DIR} \
    --model_type albert \
    --model_name_or_path albert-xxlarge-v2 \
    --task_name boolq \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --pet_per_gpu_eval_batch_size 8 \
    --pet_per_gpu_train_batch_size 2 \
    --pet_gradient_accumulation_steps 8 \
    --pet_max_steps 250 \
    --pet_max_seq_length 256 \
    --pet_repetitions 3 \
    --sc_per_gpu_train_batch_size 2 \
    --sc_per_gpu_unlabeled_batch_size 2 \
    --sc_gradient_accumulation_steps 8 \
    --sc_max_steps 5000 \
    --sc_max_seq_length 256 \
    --sc_repetitions 1

Just a heads up -- I bumped up the version of PyTorch to 1.8.0 and CUDA to 11.3 and that solved the performance issues. I am now able to run through the first 126 epochs in about 12 minutes compared to 1.5 hours. I am still waiting to see if this affects the results, but the performance is much better.

@jmcrey So, the result is ok ?

I'm now use 1080 Ti and trained with CUDA 11.5 and having TensorRT with 3 epoch.
My pre-trained model is Roberta-large and the dataset is AG News, other's arguments set to default.
It's looks like the training time needs to take half a day.