Running without a GPU

Question

Running without a GPU

Sharpz7 opened this issue a year ago · comments

Adam McArthur commented a year ago

Hey,

I was wanting to check if it is possible to run this container without a GPU?

Thanks,

Atinoda · Answer 1 · Tue Jul 18 2023 06:09:08 GMT+0800 (China Standard Time)

You sure can, and there's some instructions in #9 that should help you set it up - basically, just comment out all the gpu parts in the docker-compose.yml (or don't include --gpus all if you're running without compose).

You'll need to be a patient man though - it's slow as molasses without a GPU!

Adam McArthur · Answer 2 · Tue Jul 18 2023 06:14:15 GMT+0800 (China Standard Time)

This didn't seem to work in my environment, and it errors that it can't find a GPU when you load a model. I will try some things and get back to you.

Adam McArthur · Answer 3 · Tue Jul 18 2023 06:16:02 GMT+0800 (China Standard Time)

I managed to get it by using this guide: https://github.com/oobabooga/text-generation-webui/blob/main/docs/Low-VRAM-guide.md

And making this change:

command: ["python", "/app/server.py", "--auto-devices"]

version: "3"
services:
  text-generation-webui-docker:
    image: atinoda/text-generation-webui:default # Specify variant as the :tag
    container_name: text-generation-webui
    environment:
      - EXTRA_LAUNCH_ARGS="--listen --verbose" # Custom launch args (e.g., --model MODEL_NAME)
#      - BUILD_EXTENSIONS_LIVE="silero_tts whisper_stt" # Install named extensions during every container launch. THIS WILL SIGNIFICANLTLY SLOW LAUNCH TIME.
    ports:
      - 7860:7860  # Default web port
#      - 5000:5000  # Default API port
#      - 5005:5005  # Default streaming port
#      - 5001:5001  # Default OpenAI API extension port
    volumes:
      - ./config/loras:/app/loras
      - ./config/models:/app/models
      - ./config/presets:/app/presets
      - ./config/prompts:/app/prompts
      - ./config/softprompts:/app/softprompts
      - ./config/training:/app/training
#      - ./config/extensions:/app/extensions  # Persist all extensions
#      - ./config/extensions/silero_tts:/app/extensions/silero_tts  # Persist a single extension
    logging:
      driver:  json-file
      options:
        max-file: "3"   # number of files or file count
        max-size: '10m'
    command: ["python", "/app/server.py", "--auto-devices"]
    # deploy:
    #     resources:
    #       reservations:
    #         devices:
    #           - driver: nvidia
    #             device_ids: ['0']
    #             capabilities: [gpu]
    ```

Atinoda · Answer 4 · Tue Jul 18 2023 06:19:35 GMT+0800 (China Standard Time)

Thanks for sharing your fix and confirming that it works with CPU only on your system. Enjoy your LLM-ing, and make sure your CPU cooler is tuned up!

PS. You can append --auto-devices to the EXTRA_LAUNCH_ARGS environment variable, instead of editing the CMD.

Adam McArthur · Answer 5 · Tue Jul 18 2023 06:20:47 GMT+0800 (China Standard Time)

I also realised I was being silly - you can configure it from the settings:

https://drive.google.com/uc?id=1UEjDNVtbBh4oAdb4k_WJHPYpdpSXI2Kj

Thanks for the quick response. Looking forward to doing my LLM testing with this UI :))

If you would be interested in having a helm chart in this repo as well, I'd be happy to contribute

globavi · Answer 6 · Mon Aug 21 2023 20:59:48 GMT+0800 (China Standard Time)

You sure can, and there's some instructions in #9 that should help you set it up - basically, just comment out all the gpu parts in the docker-compose.yml (or don't include --gpus all if you're running without compose).

You'll need to be a patient man though - it's slow as molasses without a GPU!

Hi @Atinoda, does "running without gpu" assume to also use the provided Dockerfile? Imho the base image there from cuda cannot be scheduled on a machine without gpu?

Atinoda · Answer 7 · Mon Aug 21 2023 21:07:54 GMT+0800 (China Standard Time)

Hi @globavi - since this discussion there is a llama-cpu image available (see #16 ). It still uses the CUDA base image but it should work fine (I was able to run it on an Intel laptop that has only an iGPU). Can you please try it out and let me know if you run into any problems?

globavi · Answer 8 · Tue Aug 22 2023 16:21:01 GMT+0800 (China Standard Time)

Hi @Atinoda,

I could start the app with the new image (adapted few things for me as i do not use docker compose but azure infrastructure) but after downloading a GGML model in the load_model process it says:

2023-08-22 08:19:23 INFO:Loading TheBloke_Llama-2-7B-Chat-GGML... │ │ CUDA error 35 at ggml-cuda.cu:4883: CUDA driver version is insufficient for CUDA runtime version │ │ /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit │ │ Stream closed EOF for customer-dev/claims-sle-textgen-ui-bash-684c9488c6-g4rxk (textgen-webui)