seaweedfs / seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

git clone into fuse mount fails with `inflate: data stream error`

satwell opened this issue · comments

Describe the bug
A git clone of large remote repositories into a SeaweedFS FUSE mount reliably fails with this error:

error: inflate: data stream error (unknown compression method)
fatal: serious inflate inconsistency

Here are a few repos I've found that fail:

System Setup

  • List the command line to start "weed master", "weed volume", "weed filer", "weed s3", "weed mount".

    • weed server -dir=/srv/data/weedvol -master.port=9333 -volume.port=8080 -master.volumeSizeLimitMB=4096 -s3 -filer=true -volume.minFreeSpace=10 -volume.max=0
    • weed mount -filer=weedserver:8888 -dir=/home/satwell/mnt/tmp -filer.path=/
  • OS version: Debian 12 on both server and client

  • output of weed version: version 30GB 3.64 b74e8082bac408138be99e128b8c28fd19eca7a6 linux amd64

  • if using filer, show the content of filer.toml

[filer.options]
recursive_delete = false

[leveldb2]
enabled = true
dir = "/srv/data/filerldb2"

Expected behavior
Expected git clone to complete successfully. This works fine for smaller git repos that I've tried cloning.

Screenshots
Full git command and output:

halo:~/mnt/tmp% git clone --bare https://gitlab.gnome.org/GNOME/glib.git
Cloning into bare repository 'glib.git'...
remote: Enumerating objects: 211140, done.
remote: Counting objects: 100% (2144/2144), done.
remote: Compressing objects: 100% (271/271), done.
remote: Total 211140 (delta 1941), reused 2064 (delta 1873), pack-reused 208996
Receiving objects: 100% (211140/211140), 92.15 MiB | 6.06 MiB/s, done.
error: inflate: data stream error (unknown compression method)
fatal: serious inflate inconsistency
error: inflate: data stream error (unknown compression method)
error: inflate: data stream error (unknown compression method)
fatal: fetch-pack: invalid index-pack output

Please help to verify the fix.

Hello! Have the same issue.
@chrislusf it seems it still reproduces. I use 3.67 version in Kubernetes.

please share reproducing steps and logs.

@chrislusf

  • Create kind cluster:
cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
  - role: worker
  - role: worker
EOF
  • Install seaweedfs-operator:
git clone https://github.com/seaweedfs/seaweedfs-operator.git
helm install seaweedfs-operator ./seaweedfs-operator/deploy/helm -f - <<EOF
image:
  registry: ghcr.io
  repository: seaweedfs/seaweedfs-operator
  tag: latest
serviceMonitor:
  enabled: false
webhook:
  enabled: false
EOF
  • Create seaweed resource and wait until all components up:
kubectl apply -f - <<EOF
apiVersion: seaweed.seaweedfs.com/v1
kind: Seaweed
metadata:
  name: seaweedfs-storage
  namespace: default
spec:
  image: chrislusf/seaweedfs:3.67
  volumeServerDiskCount: 1
  master:
    replicas: 1
    volumeSizeLimitMB: 1024
  volume:
    replicas: 3
    requests:
      storage: 5Gi
  filer:
    replicas: 2
    s3: true

    config: |
      [leveldb2]
      enabled = true
      dir = "/data/filerldb2"
EOF
  • Install seaweed-csi-driver:
https://github.com/seaweedfs/seaweedfs-csi-driver.git
helm install seaweedfs-csi-driver ./seaweedfs-csi-driver/deploy/helm/seaweedfs-csi-driver -f - <<EOF
seaweedfsFiler: seaweedfs-storage-filer:8888
EOF
  • Create ReadWriteMany PVC:
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rwm-pvc
  namespace: default
spec:
  storageClassName: seaweedfs-storage
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
EOF
  • Create Deployment and mount the PVC:
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-pvc
  labels:
    app: test-pvc
spec:
  selector:
    matchLabels:
      app: test-pvc
  replicas: 1
  template:
    metadata:
      labels:
        app: test-pvc
    spec:
      containers:
        - name: test-pvc
          image: alpine/git
          command:
            - sleep
            - "99999"
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: rwm-pvc
              mountPath: /mnt/test-pvc
      volumes:
        - name: rwm-pvc
          persistentVolumeClaim:
            claimName: rwm-pvc
EOF
  • Try to clone quite large git repo to the PVC:
kubectl exec -it deploy/test-pvc -- /bin/sh -c 'git clone https://github.com/u-boot/u-boot.git /mnt/test-pvc/u-boot'

You will get:

Cloning into '/mnt/test-pvc/u-boot'...
remote: Enumerating objects: 999011, done.
remote: Counting objects: 100% (8605/8605), done.
remote: Compressing objects: 100% (5412/5412), done.
remote: Total 999011 (delta 3195), reused 8229 (delta 3103), pack-reused 990406
Receiving objects: 100% (999011/999011), 294.84 MiB | 9.19 MiB/s, done.
error: inflate: data stream error (unknown compression method)
fatal: serious inflate inconsistency
fatal: fetch-pack: invalid index-pack output
command terminated with exit code 128

In the same time if you clone to ephemeral storage:

kubectl exec -it deploy/test-pvc -- /bin/sh -c 'git clone https://github.com/u-boot/u-boot.git /tmp/u-boot'

Cloning into '/tmp/u-boot'...
remote: Enumerating objects: 999011, done.
remote: Counting objects: 100% (8605/8605), done.
remote: Compressing objects: 100% (5411/5411), done.
remote: Total 999011 (delta 3195), reused 8230 (delta 3104), pack-reused 990406
Receiving objects: 100% (999011/999011), 294.83 MiB | 10.08 MiB/s, done.
Resolving deltas: 100% (790494/790494), done.
Updating files: 100% (31982/31982), done.