Inference on CPU

Question

Inference on CPU

rlleshi opened this issue 5 months ago · comments

Rejnald Lleshi commented 5 months ago

From the docs, Ludwig spawns a REST API for inference. By default, this happens on a GPU.

However, is there any option to do this using CPU only for inference?

Rejnald Lleshi · Answer 1 · Tue Jan 23 2024 16:46:19 GMT+0800 (China Standard Time)

Updates

It was pointed out to me in the slack that it'd be good to provide the config file as well:

model_type: llm
# base_model: meta-llama/Llama-2-7b-hf
base_model: meta-llama/Llama-2-13b-hf

model_parameters:
  trust_remote_code: true

backend:
  type: local
  cache_dir: ./ludwig_cache

input_features:
  - name: input
    type: text
    preprocessing:
      max_sequence_length: 326

output_features:
  - name: output
    type: text
    preprocessing:
      max_sequence_length: 64

prompt:
  template: >-
    ### User: {input}

    ### Assistant:


generation:
  temperature: 0.1
  max_new_tokens: 32
  repetition_penalty: 1.0
  # remove_invalid_values: true


adapter:
  type: lora
  dropout: 0.05
  r: 8

quantization:
  bits: 4

preprocessing:
  global_max_sequence_length: 326
  split:
    type: fixed

trainer:
  type: finetune
  epochs: 9
  batch_size: 1
  eval_batch_size: 2
  gradient_accumulation_steps: 16
  learning_rate: 0.0004
  learning_rate_scheduler:
    warmup_fraction: 0.03

The serving command as shown in the docs: ludwig serve --model_path ./results/experiment_run/model

By default, this serves the model on my GPU device. My question simply is how to serve this exclusively on CPU.

Alex Sherstinsky · Answer 2 · Wed Jan 24 2024 02:19:26 GMT+0800 (China Standard Time)

@rlleshi What output/error messages are you getting? Thank you.

Rejnald Lleshi · Answer 3 · Wed Jan 24 2024 03:14:28 GMT+0800 (China Standard Time)

@alexsherstinsky thanks for getting back

I'm not getting any errors. I just want to know how to run it on a CPU device. So it's just a question. I didn't find any relevant documents from Ludwig pertaining this.