justinpinkney / stable-diffusion

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Too slow on 2xA100 SXM4

cihankaradogan opened this issue · comments

Hello,
I started training on 2xA100 SMX4 according to your tutorial. I am using pokemon.yaml file. My dataset contains 1743 images and I am loading it via huggingface. The training has been going on for 13 hours and the first epoch isn't even over yet. There are neither images produced from validation texts nor a saved checkpoint in the log folder. It says your training takes 6 hours with the 2xA6000, wouldn't you expect a similar performance from the A100?
Ekran Resmi 2022-10-20 14 36 45
Ekran Resmi 2022-10-20 14 36 36