ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

Home Page:http://ludwig.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inference on CPU

rlleshi opened this issue · comments

From the docs, Ludwig spawns a REST API for inference. By default, this happens on a GPU.

However, is there any option to do this using CPU only for inference?

Updates

It was pointed out to me in the slack that it'd be good to provide the config file as well:

model_type: llm
# base_model: meta-llama/Llama-2-7b-hf
base_model: meta-llama/Llama-2-13b-hf

model_parameters:
  trust_remote_code: true

backend:
  type: local
  cache_dir: ./ludwig_cache

input_features:
  - name: input
    type: text
    preprocessing:
      max_sequence_length: 326

output_features:
  - name: output
    type: text
    preprocessing:
      max_sequence_length: 64

prompt:
  template: >-
    ### User: {input}

    ### Assistant:


generation:
  temperature: 0.1
  max_new_tokens: 32
  repetition_penalty: 1.0
  # remove_invalid_values: true


adapter:
  type: lora
  dropout: 0.05
  r: 8

quantization:
  bits: 4

preprocessing:
  global_max_sequence_length: 326
  split:
    type: fixed

trainer:
  type: finetune
  epochs: 9
  batch_size: 1
  eval_batch_size: 2
  gradient_accumulation_steps: 16
  learning_rate: 0.0004
  learning_rate_scheduler:
    warmup_fraction: 0.03

The serving command as shown in the docs: ludwig serve --model_path ./results/experiment_run/model

By default, this serves the model on my GPU device. My question simply is how to serve this exclusively on CPU.

@rlleshi What output/error messages are you getting? Thank you.

@alexsherstinsky thanks for getting back

I'm not getting any errors. I just want to know how to run it on a CPU device. So it's just a question. I didn't find any relevant documents from Ludwig pertaining this.