Ollama, get up and running with large language models, locally.
This Community Chart is for deploying Ollama.
-
Kubernetes:
>= 1.16.0-0
for CPU only -
Kubernetes:
>= 1.26.0-0
for GPU stable support (NVIDIA and AMD)
Not all GPUs are currently supported with ollama (especially with AMD)
To install the ollama
chart in the ollama
namespace:
helm repo add ollama-helm https://otwld.github.io/ollama-helm/
helm repo update
helm install ollama ollama-helm/ollama --namespace ollama
First please read the release notes of Ollama to make sure there are no backwards incompatible changes.
Make adjustments to your values as needed, then run helm upgrade
:
# -- This pulls the latest version of the ollama chart from the repo.
helm repo update
helm upgrade ollama ollama-helm/ollama --namespace ollama --values values.yaml
To uninstall/delete the ollama
deployment in the ollama
namespace:
helm delete ollama --namespace ollama
Substitute your values if they differ from the examples. See helm delete --help
for a full reference on delete
parameters and flags.
- Ollama documentation can be found HERE
- Interact with RESTful API: Ollama API
- Interact with official clients libraries: ollama-js and ollama-python
- Interact with langchain: langchain-js and langchain-python
- It's highly recommended to run an updated version of Kubernetes for deploying ollama with GPU
ollama:
gpu:
# -- Enable GPU integration
enabled: true
# -- GPU type: 'nvidia' or 'amd'
type: 'nvidia'
# -- Specify the number of GPU to 1
number: 1
# -- List of models to pull at container startup
models:
- mistral
- llama2
ollama:
models:
- llama2
ingress:
enabled: true
hosts:
- host: ollama.domain.lan
paths:
- path: /
pathType: Prefix
- API is now reachable at
ollama.domain.lan
- See values.yaml to see the Chart's default values.
Key | Type | Default | Description |
---|---|---|---|
affinity | object | {} |
Affinity for pod assignment |
autoscaling.enabled | bool | false |
Enable autoscaling |
autoscaling.maxReplicas | int | 100 |
Number of maximum replicas |
autoscaling.minReplicas | int | 1 |
Number of minimum replicas |
autoscaling.targetCPUUtilizationPercentage | int | 80 |
CPU usage to target replica |
extraArgs | list | [] |
Additional arguments on the output Deployment definition. |
extraEnv | list | [] |
Additional environments variables on the output Deployment definition. |
extraEnvFrom | list | [] |
Additionl environment variables from external sources (like ConfigMap) |
fullnameOverride | string | "" |
String to fully override template |
hostIPC | bool | false |
Use the host’s ipc namespace. |
hostNetwork | bool | false |
Use the host's network namespace. |
hostPID | bool | false |
Use the host’s pid namespace |
image.pullPolicy | string | "IfNotPresent" |
Docker pull policy |
image.repository | string | "ollama/ollama" |
Docker image registry |
image.tag | string | "" |
Docker image tag, overrides the image tag whose default is the chart appVersion. |
imagePullSecrets | list | [] |
Docker registry secret names as an array |
ingress.annotations | object | {} |
Additional annotations for the Ingress resource. |
ingress.className | string | "" |
IngressClass that will be used to implement the Ingress (Kubernetes 1.18+) |
ingress.enabled | bool | false |
Enable ingress controller resource |
ingress.hosts[0].host | string | "ollama.local" |
|
ingress.hosts[0].paths[0].path | string | "/" |
|
ingress.hosts[0].paths[0].pathType | string | "Prefix" |
|
ingress.tls | list | [] |
The tls configuration for hostnames to be covered with this ingress record. |
initContainers | list | [] |
Init containers to add to the pod |
knative.containerConcurrency | int | 0 |
Knative service container concurrency |
knative.enabled | bool | false |
Enable Knative integration |
knative.idleTimeoutSeconds | int | 300 |
Knative service idle timeout seconds |
knative.responseStartTimeoutSeconds | int | 300 |
Knative service response start timeout seconds |
knative.timeoutSeconds | int | 300 |
Knative service timeout seconds |
lifecycle | object | {} |
Lifecycle for pod assignment (override ollama.models startup pulling) |
livenessProbe.enabled | bool | true |
Enable livenessProbe |
livenessProbe.failureThreshold | int | 6 |
Failure threshold for livenessProbe |
livenessProbe.initialDelaySeconds | int | 60 |
Initial delay seconds for livenessProbe |
livenessProbe.path | string | "/" |
Request path for livenessProbe |
livenessProbe.periodSeconds | int | 10 |
Period seconds for livenessProbe |
livenessProbe.successThreshold | int | 1 |
Success threshold for livenessProbe |
livenessProbe.timeoutSeconds | int | 5 |
Timeout seconds for livenessProbe |
nameOverride | string | "" |
String to partially override template (will maintain the release name) |
nodeSelector | object | {} |
Node labels for pod assignment. |
ollama.gpu.enabled | bool | false |
Enable GPU integration |
ollama.gpu.mig.devices | object | {} |
Specify the mig devices and the corresponding number |
ollama.gpu.mig.enabled | bool | false |
Enable multiple mig devices If enabled you will have to specify the mig devices If enabled is set to false this section is ignored |
ollama.gpu.number | int | 1 |
Specify the number of GPU If you use MIG section below then this parameter is ignored |
ollama.gpu.nvidiaResource | string | "nvidia.com/gpu" |
only for nvidia cards; change to (example) 'nvidia.com/mig-1g.10gb' to use MIG slice |
ollama.gpu.type | string | "nvidia" |
GPU type: 'nvidia' or 'amd' If 'ollama.gpu.enabled', default value is nvidia If set to 'amd', this will add 'rocm' suffix to image tag if 'image.tag' is not override This is due cause AMD and CPU/CUDA are different images |
ollama.insecure | bool | false |
Add insecure flag for pulling at container startup |
ollama.models | list | [] |
List of models to pull at container startup The more you add, the longer the container will take to start if models are not present models: - llama2 - mistral |
ollama.mountPath | string | "" |
Override ollama-data volume mount path, default: "/root/.ollama" |
persistentVolume.accessModes | list | ["ReadWriteOnce"] |
Ollama server data Persistent Volume access modes Must match those of existing PV or dynamic provisioner Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ |
persistentVolume.annotations | object | {} |
Ollama server data Persistent Volume annotations |
persistentVolume.enabled | bool | false |
Enable persistence using PVC |
persistentVolume.existingClaim | string | "" |
If you'd like to bring your own PVC for persisting Ollama state, pass the name of the created + ready PVC here. If set, this Chart will not create the default PVC. Requires server.persistentVolume.enabled: true |
persistentVolume.size | string | "30Gi" |
Ollama server data Persistent Volume size |
persistentVolume.storageClass | string | "" |
Ollama server data Persistent Volume Storage Class If defined, storageClassName: If set to "-", storageClassName: "", which disables dynamic provisioning If undefined (the default) or set to null, no storageClassName spec is set, choosing the default provisioner. (gp2 on AWS, standard on GKE, AWS & OpenStack) |
persistentVolume.subPath | string | "" |
Subdirectory of Ollama server data Persistent Volume to mount Useful if the volume's root directory is not empty |
persistentVolume.volumeMode | string | "" |
Ollama server data Persistent Volume Binding Mode If defined, volumeMode: If empty (the default) or set to null, no volumeBindingMode spec is set, choosing the default mode. |
persistentVolume.volumeName | string | "" |
Pre-existing PV to attach this claim to Useful if a CSI auto-provisions a PV for you and you want to always reference the PV moving forward |
podAnnotations | object | {} |
Map of annotations to add to the pods |
podLabels | object | {} |
Map of labels to add to the pods |
podSecurityContext | object | {} |
Pod Security Context |
readinessProbe.enabled | bool | true |
Enable readinessProbe |
readinessProbe.failureThreshold | int | 6 |
Failure threshold for readinessProbe |
readinessProbe.initialDelaySeconds | int | 30 |
Initial delay seconds for readinessProbe |
readinessProbe.path | string | "/" |
Request path for readinessProbe |
readinessProbe.periodSeconds | int | 5 |
Period seconds for readinessProbe |
readinessProbe.successThreshold | int | 1 |
Success threshold for readinessProbe |
readinessProbe.timeoutSeconds | int | 3 |
Timeout seconds for readinessProbe |
replicaCount | int | 1 |
Number of replicas |
resources.limits | object | {} |
Pod limit |
resources.requests | object | {} |
Pod requests |
runtimeClassName | string | "" |
Specify runtime class |
securityContext | object | {} |
Container Security Context |
service.annotations | object | {} |
Annotations to add to the service |
service.loadBalancerIP | string | nil |
Load Balancer IP address |
service.nodePort | int | 31434 |
Service node port when service type is 'NodePort' |
service.port | int | 11434 |
Service port |
service.type | string | "ClusterIP" |
Service type |
serviceAccount.annotations | object | {} |
Annotations to add to the service account |
serviceAccount.automount | bool | true |
Automatically mount a ServiceAccount's API credentials? |
serviceAccount.create | bool | true |
Specifies whether a service account should be created |
serviceAccount.name | string | "" |
The name of the service account to use. If not set and create is true, a name is generated using the fullname template |
tolerations | list | [] |
Tolerations for pod assignment |
topologySpreadConstraints | object | {} |
Topology Spread Constraints for pod assignment |
updateStrategy.type | string | "Recreate" |
Deployment strategy can be "Recreate" or "RollingUpdate". Default is Recreate |
volumeMounts | list | [] |
Additional volumeMounts on the output Deployment definition. |
volumes | list | [] |
Additional volumes on the output Deployment definition. |
Jean Baptiste Detroyes |
Nathan Tréhout |
- For questions, suggestions, and discussion about Ollama please refer to the Ollama issue page
- For questions, suggestions, and discussion about this chart please visit Ollama-Helm issue page or join our OTWLD Discord