goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.

Home Page:https://goharbor.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue during push "client disconnected during blob PATCH"

gwagner681105 opened this issue · comments

Harbor Version: 2.9
Kubernetes Installation

We have a repo, where the upload of a particular image is not possible anymore. It could be, that previous push ran into quota and did possibly left some leftovers in the repo storage folder

{"stream":"stderr","logtag":"F","message":"time=\"2024-05-22T04:25:34.394398505Z\" level=error msg=\"client disconnected during blob PATCH\" auth.user.name=\"harbor_registry_user\" contentLength=-1 copied=94383255 error=\"unexpected EOF\" go.version=go1.20.7 http.request.host=harbor.abraxas-tools.ch http.request.id=e2f33e77-52b4-40c3-b7d0-bc6c014bd2c0 http.request.method=PATCH http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/data/products/airflow/cache/blobs/uploads/01084cd7-ab53-418a-8d52-7e30a529c755?_state=ZOpaG69hMQ-FKDFlASFea5KG2PWDMsFLq6DyXqS2vGJ7Ik5hbWUiOiJkYXRhL3Byb2R1Y3RzL2FpcmZsb3cvY2FjaGUiLCJVVUlEIjoiMDEwODRjZDctYWI1My00MThhLThkNTItN2UzMGE1MjljNzU1IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA1LTIyVDA0OjI1OjI0LjMzMjQ2MDYyNVoifQ%3D%3D\" http.request.useragent=\"kaniko/v1.3.0\" vars.name=\"data/products/airflow/cache\" vars.uuid=01084cd7-ab53-418a-8d52-7e30a529c755 "}

I only see the cache folder in the GUI. --> [data/products/airflow/cache]
We tried to delete this path from GUI, but the upload failed again.

I have seen that, even if I deleted the path data/products/airflow/cache in the GUI, we still have the data on the storage
/docker/registry/v2/repositories/data/products/airflow

Is there a way to force the upload or to cleanup the leftovers in order to get the upload working again?

I forgot to mention, the client behavior was like the following:

All except 1 layers have been uploaded successfully, only one layer is failing and the docker client ended with the error message
"reset by peer" or "use of closed network connection"

In the log of the core pot (kubernetes pot) the error
"http: proxy error: context canceled"
appears

Hi @gwagner681105 ,
I am curious how you get into this issue, could you share the exact running cmd and more about the registry log?

Hi @MinerYang
I am a bit further. I found out, that our WAF, which is in between possibly causes the issue.
At least one of the failing use cases is working when bypassing the WAF System.
I am currently waiting for the confirmation regaring the second failing use case

Maybe your WAF system blocks the PATCH method?

I am currently analysing the issue on our WAF. I will post the result here

Can you please post the full log of the registry? please grep with PATCH method's return code in the registry.log

I had a similar issue once. May I ask what kind of storage are you exactly using? If its S3, what kind of it exactly? For me it was Ceph S3.

We are using NFS.

File with Patch Operation:
Explore-logs-2024-06-04 09_04_19.json

The faulty occurence has been decreased over the time. We only had one occurence in the last 48 hours:
Explore-logs-2024-06-04 09_08_32.json

I will have a look at the issue this week (WAF Team is always very busy) ;-)

Finally, we had a look with our WAF Engineer at the Issue. The don't see any anomalies on the WAF.
We only see an PATCH Command http return code 242 followed by 499.

10.73.64.85 - - [05/Jun/2024:13:13:49 +0200] "PATCH /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=w9HBOfUDJ_FIQ0BoQfh61UU8YL84mBU9NT-x9lpvLp57Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ5LjEwMzQyMDk5NFoifQ%3D%3D HTTP/1.0" 202 0 "-" "kaniko/v1.15.0" harbor.abraxas-tools.ch 242 - TLSv1.3;TLS13-AES256-GCM-SHA384;256 /abraxas/vs-harbor.abraxas-tools.ch_internal

10.73.64.85 - - [05/Jun/2024:13:13:52 +0200] "PATCH /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/a7d87790-b482-4278-9c67-8a7510ffd69c?_state=iJ-rRrik9UGoxjeiKbpYKXgfVRJqbgScTJhbyDHehaR7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiYTdkODc3OTAtYjQ4Mi00Mjc4LTljNjctOGE3NTEwZmZkNjljIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ4Ljg0MDA4MTEyNFoifQ%3D%3D HTTP/1.0" 499 21 "-" "kaniko/v1.15.0" harbor.abraxas-tools.ch 3904 - TLSv1.3;TLS13-AES256-GCM-SHA384;256 /abraxas/vs-harbor.abraxas-tools.ch_internal

Corresponding Harbor log:

2024-06-05 13:13:52.891 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:52.890874824Z\" level=error msg=\"client disconnected during blob PATCH\" auth.user.name=\"harbor_registry_user\" contentLength=-1 copied=140676763 error=\"unexpected EOF\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=17a4c5be-02ac-4410-b0c4-e720da983ba0 http.request.method=PATCH http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/a7d87790-b482-4278-9c67-8a7510ffd69c?_state=iJ-rRrik9UGoxjeiKbpYKXgfVRJqbgScTJhbyDHehaR7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiYTdkODc3OTAtYjQ4Mi00Mjc4LTljNjctOGE3NTEwZmZkNjljIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ4Ljg0MDA4MTEyNFoifQ%3D%3D\" http.request.useragent=\"kaniko/v1.15.0\" vars.name=\"ste_publi/deklaration/webapps/declaration-webapp/cache\" vars.uuid=a7d87790-b482-4278-9c67-8a7510ffd69c "} |   -- | -- | --   |   | 2024-06-05 13:13:50.210 | {"stream":"stdout","logtag":"F","message":"100.125.140.175 - - [05/Jun/2024:11:13:49 +0000] \"PUT /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=Stt57n4r7u8lEWOkWOH5EgGOMPe4E8ycGOIhW-8rcpJ7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjozMjcsIlN0YXJ0ZWRBdCI6IjIwMjQtMDYtMDVUMTE6MTM6NDlaIn0%3D&digest=sha256%3A32b2336bada1e1f24b9f7c45a989b10e5882886fd4aac43a357f5027ff5d7290 HTTP/1.1\" 201 0 \"\" \"kaniko/v1.15.0\""} |     |   | 2024-06-05 13:13:50.210 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:50.210557719Z\" level=info msg=\"response completed\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=2e77e46b-16a2-4153-9cc5-974a9d0782be http.request.method=PUT http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=Stt57n4r7u8lEWOkWOH5EgGOMPe4E8ycGOIhW-8rcpJ7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjozMjcsIlN0YXJ0ZWRBdCI6IjIwMjQtMDYtMDVUMTE6MTM6NDlaIn0%3D&digest=sha256%3A32b2336bada1e1f24b9f7c45a989b10e5882886fd4aac43a357f5027ff5d7290\" http.request.useragent=\"kaniko/v1.15.0\" http.response.duration=297.081707ms http.response.status=201 http.response.written=0 "} |     |   | 2024-06-05 13:13:50.015 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:50.014896031Z\" level=info msg=\"authorized request\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=2e77e46b-16a2-4153-9cc5-974a9d0782be http.request.method=PUT http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=Stt57n4r7u8lEWOkWOH5EgGOMPe4E8ycGOIhW-8rcpJ7Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjozMjcsIlN0YXJ0ZWRBdCI6IjIwMjQtMDYtMDVUMTE6MTM6NDlaIn0%3D&digest=sha256%3A32b2336bada1e1f24b9f7c45a989b10e5882886fd4aac43a357f5027ff5d7290\" http.request.useragent=\"kaniko/v1.15.0\" vars.name=\"ste_publi/deklaration/webapps/declaration-webapp/cache\" vars.uuid=8ca49121-5b56-46d5-834f-2d2b112d30b9 "} |     |   | 2024-06-05 13:13:49.495 | {"stream":"stdout","logtag":"F","message":"100.125.140.175 - - [05/Jun/2024:11:13:49 +0000] \"PATCH /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=w9HBOfUDJ_FIQ0BoQfh61UU8YL84mBU9NT-x9lpvLp57Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ5LjEwMzQyMDk5NFoifQ%3D%3D HTTP/1.1\" 202 0 \"\" \"kaniko/v1.15.0\""} |     |   | 2024-06-05 13:13:49.495 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:49.49528143Z\" level=info msg=\"response completed\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=066bfd53-4f53-46c3-9419-c0919a268ea0 http.request.method=PATCH http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=w9HBOfUDJ_FIQ0BoQfh61UU8YL84mBU9NT-x9lpvLp57Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ5LjEwMzQyMDk5NFoifQ%3D%3D\" http.request.useragent=\"kaniko/v1.15.0\" http.response.duration=186.10932ms http.response.status=202 http.response.written=0 "} |     |   | 2024-06-05 13:13:49.443 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:49.443329388Z\" level=info msg=\"authorized request\" go.version=go1.20.7 http.request.contenttype=\"application/octet-stream\" http.request.host=harbor.abraxas-tools.ch http.request.id=066bfd53-4f53-46c3-9419-c0919a268ea0 http.request.method=PATCH http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/8ca49121-5b56-46d5-834f-2d2b112d30b9?_state=w9HBOfUDJ_FIQ0BoQfh61UU8YL84mBU9NT-x9lpvLp57Ik5hbWUiOiJzdGVfcHVibGkvZGVrbGFyYXRpb24vd2ViYXBwcy9kZWNsYXJhdGlvbi13ZWJhcHAvY2FjaGUiLCJVVUlEIjoiOGNhNDkxMjEtNWI1Ni00NmQ1LTgzNGYtMmQyYjExMmQzMGI5IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDI0LTA2LTA1VDExOjEzOjQ5LjEwMzQyMDk5NFoifQ%3D%3D\" http.request.useragent=\"kaniko/v1.15.0\" vars.name=\"ste_publi/deklaration/webapps/declaration-webapp/cache\" vars.uuid=8ca49121-5b56-46d5-834f-2d2b112d30b9 "} |     |   | 2024-06-05 13:13:49.234 | {"stream":"stdout","logtag":"F","message":"100.125.140.175 - - [05/Jun/2024:11:13:48 +0000] \"POST /v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/ HTTP/1.1\" 202 0 \"\" \"kaniko/v1.15.0\""} |     |   | 2024-06-05 13:13:49.234 | {"stream":"stderr","logtag":"F","message":"time=\"2024-06-05T11:13:49.234746598Z\" level=info msg=\"response completed\" go.version=go1.20.7 http.request.contenttype=\"application/json\" http.request.host=harbor.abraxas-tools.ch http.request.id=a9f6da61-4533-4613-a6c3-870ed8bde2b7 http.request.method=POST http.request.remoteaddr=10.73.35.126 http.request.uri=\"/v2/ste_publi/deklaration/webapps/declaration-webapp/cache/blobs/uploads/\" http.request.useragent=\"kaniko/v1.15.0\" http.response.duration=290.573139ms http.response.status=202 http.response.written=0 "}
commented

I have the same problem. Did you manage to find the reason?

I used Traefik/DockerSwarm/Registry:2

Our WAF Team did not find any anomalies on their site. We have a F5 WAF System. Funnywise the issue disappeared magically.
The last occurence happened 5 days ago.
In any cases we will update to 2.11 and keep an eye on it.

I am using traefik as the reverse proxy before a http Harbor stack (connected on the local network). I ran into this issue always after 60 seconds. Turns out, I needed increase the readTimeout:

entrypoints:
  websecure:
    address: ":443"
    transport:
      respondingTimeouts:
        readTimeout: 1800

also important in the harbor.yml:

    redirect:
      disable: true

I am using traefik as the reverse proxy before a http Harbor stack (connected on the local network). I ran into this issue always after 60 seconds. Turns out, I needed increase the readTimeout:

entrypoints:
  websecure:
    address: ":443"
    transport:
      respondingTimeouts:
        readTimeout: 1800

also important in the harbor.yml:

    redirect:
      disable: true

AFAIK redirectonly does apply to s3 while pulling the image so I don't think thats related

Just want to leave a comment in case anyone runs into this like us.
In our case, this was due to a firewall. It flagged python:3.10-bookworm as Virus/Linux.WGeneric.eizzgy.

We terminate SSL outside of the cluster. The FW intercepted HTTP traffic from LB to cluster, and sent a RST to harbor-core.

Hope this helps someone.

Just want to leave a comment in case anyone runs into this like us. In our case, this was due to a firewall. It flagged python:3.10-bookworm as Virus/Linux.WGeneric.eizzgy.

We terminate SSL outside of the cluster. The FW intercepted HTTP traffic from LB to cluster, and sent a RST to harbor-core.

Hope this helps someone.

Thank you for leaving this comment. I was facing issues with pushing some images for a week but the networking team insisted there had been no changes and no traffic was being blocked. Only specific layers were failing due to blob PATCH and I could not get it to work no matter what.

Apparently, python:3.12-slim-bookworm was also being flagged as Virus/Linux.WGeneric.eizzgy. I provided your comment to the networking team and they were able to fix my issue.

commented

@patzm
I LOVE YOU ! YOU ARE THE BEST !

I spent 2 days to fix the same issue. My infrastructure: Docker-swarm, Traefik (Gitlab), Gitlab (Docker), and more other containers.

I think - problem in gitlab nginx settings ...

THANK YOU :)))))


In traefik official docs: https://doc.traefik.io/traefik/v2.11/routing/entrypoints/#transport - no limits. Mistake in docs ?

I add next lines to my traefik config

  [entryPoints.websecure]
    address = ":443"
    # New lines from here
    [entryPoints.websecure.transport]
      [entryPoints.websecure.transport.respondingTimeouts]
        idleTimeout = 600
        writeTimeout = 600
        readTimeout = 600