Wasn't able to get the docker image running (Illegal instruction (core dumped)) (no AVX support?)

Question

Wasn't able to get the docker image running (Illegal instruction (core dumped)) (no AVX support?)

voarsh2 opened this issue a year ago · comments

main: seed = 1681324505

Illegal instruction (core dumped)

Wasn't able to get the docker image running.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "4"
    field.cattle.io/publicEndpoints: '[{"addresses":["192.168.100.103"],"port":32435,"protocol":"TCP","serviceName":"serge:turbopilot-nodeport","allNodes":true}]'
  labels:
    workload.user.cattle.io/workloadselector: apps.deployment-serge-turbopilot
  name: turbopilot
  namespace: serge
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-serge-turbopilot
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-serge-turbopilot
    spec:
      containers:
      - env:
        - name: MODEL
          value: /models/codegen-2B-multi-ggml-4bit-quant.bin
        image: ghcr.io/ravenscroftj/turbopilot/turbopilot:latest
        imagePullPolicy: IfNotPresent
        name: container-0
        ports:
        - containerPort: 18080
          name: turbopilot
          protocol: TCP
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /models
          name: turbopilot
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30
      volumes:
      - name: turbopilot
        persistentVolumeClaim:
          claimName: turbopilot

James Ravenscroft · Answer 1 · Thu Apr 13 2023 03:42:23 GMT+0800 (China Standard Time)

Hi there thanks for your ticket. Can I ask what operating system and processor architecture you are running your k8s cluster on?

voarsh2 · Answer 2 · Thu Apr 13 2023 08:34:21 GMT+0800 (China Standard Time)

Hi there thanks for your ticket. Can I ask what operating system and processor architecture you are running your k8s cluster on?

I am running Ubuntu 22.04 LTS.
Kubernetes V1.23
The CPU's support AVX (not AVX2) - some example of my node CPU's:
(48 x Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (2 Sockets), 32 x Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (2 Sockets))

I haven't tried the container on my Ryzen system that has AVX2 support..... will try that now......

Update. No complaints on the K8 node that has (what I assume) is AVX2 support. You might want to add AVX support (plenty of powerful enough CPU's don't have AVX2 support) - gotta love bulky RAM. :D

D32vd · Answer 3 · Thu Apr 13 2023 09:21:12 GMT+0800 (China Standard Time)

It happened to me, too
when i run
docker run --rm -it -v /home/yang/models:/models -e MODEL="/models/codegen-2B-multi-ggml-4bit-quant.bin" -p 80:18080 ghcr.io/ravenscroftj/turbopilot/turbopilot:latest
terminal output
main: seed = 1681348410
gptj_model_load: loading model from '/models/codegen-2B-multi-ggml-4bit-quant.bin' - please wait ...
gptj_model_load: n_vocab = 51200
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 2560
gptj_model_load: n_head = 32
gptj_model_load: n_layer = 32
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 3
Illegal instruction (core dumped)

My Operating Environment :
docker 20.10.18 Community
Ubuntu 22.04.1 LTS
Intel(R) Celeron(R) CPU J1900 @ 1.99GHz
I pulled the docker image directly

James Ravenscroft · Answer 4 · Thu Apr 13 2023 14:26:22 GMT+0800 (China Standard Time)

Great thanks for looking into the AVX thing - in hindsight that was the obvious problem - I should have known 🤦

So I think the short term solution would be to build the image on the system you're targetting and the C preprocessor should pick up what instructionsets are supported. The slightly longer term solution is for me to add CI builds that build with different sets of CPU instructions and make them available as part of the release

Oshan Wisumperuma · Answer 5 · Fri Apr 14 2023 13:50:11 GMT+0800 (China Standard Time)

docker run --rm -it -v ./models:/models -e THREADS=6 -e MODEL="/models/codegen-6B-multi-ggml-4bit-quant.bin" -p 18080:18080 ghcr.io/ravenscroftj/turbopilot/turbopilot:latest

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

main: seed = 1681194918
gptj_model_load: loading model from '/models/codegen-6B-multi-ggml-4bit-quant.bin' - please wait ...
gptj_model_load: n_vocab = 51200
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 33
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 6325.92 MB
Illegal instruction (core dumped)

Apple M1 Pro
Mac 13.2
colima 0.5.4

James Ravenscroft · Answer 6 · Fri Apr 14 2023 14:27:21 GMT+0800 (China Standard Time)

hi @oshanz - please try changing the image uri to ghcr.io/ravenscroftj/turbopilot:latest - the path from your log above is the old build.

Ajay Kumar Saini · Answer 7 · Mon Apr 17 2023 11:05:11 GMT+0800 (China Standard Time)

@ravenscroftj same issue with above image on mac m1.

James Ravenscroft · Answer 8 · Mon Apr 17 2023 14:02:00 GMT+0800 (China Standard Time)

hi @4j4y - just to clarify when you say image above you are referring to ghcr.io/ravenscroftj/turbopilot:latest (as opposed to the old image with /turbopilot/turbopilot in it)?

I don't have an apple silicon device so I'm struggling to work out what is working and what isn't. Are you running docker desktop on your m1 in order to run the image? Don't suppose you can share the output from your docker pull or docker run command (I'm after the image hash that you are using).

Finally - have you tried downloading the binary zip file from the releases page and running that? Does it give the same error?

James Ravenscroft · Answer 9 · Fri Jun 16 2023 14:26:14 GMT+0800 (China Standard Time)

support for older instruction sets is provided in release 0.0.5