mosaicml / examples

Fast and flexible reference benchmarks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training Time estimation on single GPU A100 80G

tiru1930 opened this issue · comments

commented

Hi

I am pre training the BERT model for FP32,BF16 ,my estimated training time for FP32, 128 sequence length , 256 batch size, is 160 hrs, with mosic bert , is this expected , i don't see much reduction when it compared with hugging face BERT.

Hi @tiru1930, can you please indicate what algorithms you are using when training BERT? We have our BERT example with recipes our examples repo. CC: @jacobfulano and @alextrott16

Hi @tiru1930, can you provide more details on the hardware you are running and the throughput measurements? Also is there a specific reason you are using fp32 instead of bf16? Feel free to reach out over the community slack

commented

@tiru1930 Unfortunately, we will need more detail to proceed. I'm not sure:

  1. what you mean by quantized model given we're discussing training
  2. can you please provide concrete throughput measurements that we can compare against our numbers to verify?
commented

@mvpatel2000

  1. We are building new GPU, so for comparison purpose we want to train BERT model on NVIDIA GPU with different configurations as i mentioned FP32,FB16,BF16 and INT8 etc.
  2. I don't understand what is throughput here, my end goal is to Train the full BERT model with Mosaic settings on Single GPU. Since it was mentioned on 8GPUS it took 4.5 hrs, I am expecting it to be completed in 36 hrs in single GPU

@tbaggu

Throughput is defined as how many samples a model processes per second. If you are using a custom accelerator (I'm not sure what you mean by building a new GPU), it is hard for us to help you as it could be the accelerator is slower.

commented

It should be a reasonable approximation and is correct in limit of 0 sequence length. As the sequence length grows, the approximation becomes worse.

We have the exact formula here assuming you only have linear layers and attention

commented