Inference on CPU
rlleshi opened this issue · comments
From the docs, Ludwig spawns a REST API for inference. By default, this happens on a GPU.
However, is there any option to do this using CPU only for inference?
Updates
It was pointed out to me in the slack that it'd be good to provide the config file as well:
model_type: llm
# base_model: meta-llama/Llama-2-7b-hf
base_model: meta-llama/Llama-2-13b-hf
model_parameters:
trust_remote_code: true
backend:
type: local
cache_dir: ./ludwig_cache
input_features:
- name: input
type: text
preprocessing:
max_sequence_length: 326
output_features:
- name: output
type: text
preprocessing:
max_sequence_length: 64
prompt:
template: >-
### User: {input}
### Assistant:
generation:
temperature: 0.1
max_new_tokens: 32
repetition_penalty: 1.0
# remove_invalid_values: true
adapter:
type: lora
dropout: 0.05
r: 8
quantization:
bits: 4
preprocessing:
global_max_sequence_length: 326
split:
type: fixed
trainer:
type: finetune
epochs: 9
batch_size: 1
eval_batch_size: 2
gradient_accumulation_steps: 16
learning_rate: 0.0004
learning_rate_scheduler:
warmup_fraction: 0.03
The serving command as shown in the docs: ludwig serve --model_path ./results/experiment_run/model
By default, this serves the model on my GPU device. My question simply is how to serve this exclusively on CPU.
@rlleshi What output/error messages are you getting? Thank you.
@alexsherstinsky thanks for getting back
I'm not getting any errors. I just want to know how to run it on a CPU device. So it's just a question. I didn't find any relevant documents from Ludwig pertaining this.