pharmapsychotic / clip-interrogator

Image to prompt with BLIP and CLIP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loading flavors.txt is very slow

pmeems opened this issue · comments

I run on

  • Windows 10 64Bit,
  • Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz with 16 GB RAM,
  • NVIDIA GeForce GTX 1070 8GB

I use

  • Python v3.11.5,
  • torch v2.2.0.dev20231003+cu121,
  • CUDA 12.1
  • clip-interrogator v0.6.0

I run run_cli.py as python run_cli.py -c "ViT-H-14/laion2b_s32b_b79k" -f "d:\MyFolder"

Loading the flavors.txt is taken very long. It is now running almost 2 hours and is just at 16%:

python run_cli.py -c "ViT-H-14/laion2b_s32b_b79k" -f "D:\MyFolder"
CUDA is available and will be used.
CUDO version: 12.1
Loading caption model blip-large...
Loading CLIP model ViT-H-14/laion2b_s32b_b79k...
ViT-H-14_laion2b_s32b_b79k_artists.safetensors: 100%|█████████████████████████████| 21.6M/21.6M [00:01<00:00, 16.0MB/s]
Preprocessing artists:   0%|                                                                     | 0/1 [00:00<?, ?it/s]
  attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)
Preprocessing artists: 100%|█████████████████████████████████████████████████████████████| 1/1 [00:10<00:00, 10.64s/it]
ViT-H-14_laion2b_s32b_b79k_flavors.safetensors: 100%|███████████████████████████████| 207M/207M [00:10<00:00, 20.1MB/s]
ViT-H-14_laion2b_s32b_b79k_mediums.safetensors: 100%|███████████████████████████████| 195k/195k [00:00<00:00, 2.16MB/s]
ViT-H-14_laion2b_s32b_b79k_movements.safetensors: 100%|█████████████████████████████| 410k/410k [00:00<00:00, 4.89MB/s]
ViT-H-14_laion2b_s32b_b79k_trendings.safetensors: 100%|█████████████████████████████| 148k/148k [00:00<00:00, 2.91MB/s]
Preprocessing trendings: 100%|███████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.46it/s]
ViT-H-14_laion2b_s32b_b79k_negative.safetensors: 100%|████████████████████████████| 84.2k/84.2k [00:00<00:00, 3.47MB/s]
Loaded CLIP model and data in 43.98 seconds.
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 73.53it/s]
Flavor chain:  16%|█████████▋                                                    | 5/32 [1:59:06<10:15:18, 1367.37s/it]

Months ago I used the same command it was very fast. In the meantime I, obviously, installed Windows updates, got the latest version of this repo, and updated my CUDA to v12.1 (also tried 12.2, but then Torch didn't recognize the GPU).

While running this script my CPU is at 27%, my memory is at 86% and my GPU is at 3%
What can I do to speed it up?

Edit
I removed all lines in flavor.txt except for the first 5 lines.
Now the Flavor chain is much faster but it still takes 45-60 minutes per image (928x1312px, 468kB).
And it looks my GPU isn't used:
image

Earlier versions took 3-5 minutes per image.
What versions of what package should I use to get the speed back?

I had luck changing the LabelTable object's flavor_intermediate_count (ln 53) to a lower number

Thanks @genevera I changed the value in the low_vram section from 1024 to 512 and used the --low_vram parameter and now it works again. It takes about 2 minutes for each image which is fine for me.

I did notice the GPU is not doing a lot, is that expected? I thought this script would use the GPU as well.