facebook / buck2

Build system, successor to Buck

Home Page:https://buck2.build/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RE: upload cancelled with "stream error: stream no longer needed"

avdv opened this issue · comments

For the following BUCK file:

# frontend/BUCK

filegroup(
    name = "assets",
    srcs = glob([
        "public/**/*",
        "src/**/*.css",
        "src/**/*.svg",
        "src/**/*.mp4",
        "src/**/*.wav",
        "src/**/*.png",
        "src/**/*.jpg",
        "src/**/*.gif",
        "src/webfonts/**/*",
    ]),
)

genrule(
    name = "assets_pack",
    srcs = [":assets"],
    out = "assets.tar",
    bash = """tar -cf $OUT -C $(location :assets) .""",
)

when using remote execution (we are currently using the bazel-remote-worker locally) we see the following error:

Action failed: root//frontend:assets_pack (genrule)
Internal error (stage: remote_upload_error): Remote Execution Error (GRPC-SESSION-ID): RE: upload: status: Cancelled, message: "h2 protocol error: http2 error: stream error received: stream no longer needed", details: [], metadata: MetadataMap { headers: {} }: transport error: http2 error: stream error received: stream no longer needed: stream error received: stream no longer needed
stdout:
stderr:
Build ID: ab46755d-0762-43ef-bf04-0c44d8a0d44a
Network: (GRPC-SESSION-ID)
Jobs completed: 49. Time elapsed: 0.7s.
Cache hits: 0%. Commands: 2 (cached: 0, remote: 1, local: 1)
BUILD FAILED
Failed to build 'root//frontend:assets_pack (prelude//platforms:default#524f8da68ea2a374)'

This seems to be related to the structure of the srcs of the filegroup since the public folder contains symlinks into the src folder:

$ ls -lh public/icons public/webfonts 
lrwxrwxrwx 1 claudio users 17 Feb  8 09:54 public/icons -> ../src/img/icons/
lrwxrwxrwx 1 claudio users 16 Jan  9 09:14 public/webfonts -> ../src/webfonts/

After removing either public/icons or public/webfonts, the upload succeeds. And once it succeeded, it also succeeds when the symlinks are restored:

λ buck2 build frontend:assets_pack
Action failed: root//frontend:assets_pack (genrule)
Internal error (stage: remote_upload_error): Remote Execution Error (GRPC-SESSION-ID): RE: upload: status: Cancelled, message: "h2 protocol error: http2 error: stream error received: stream no longer needed", details: [], metadata: MetadataMap { headers: {} }: transport error: http2 error: stream error received: stream no longer needed: stream error received: stream no longer needed
stdout:
stderr:
Build ID: beedf529-fa52-4f0b-b8e4-537c41bff186
Network: (GRPC-SESSION-ID)
Jobs completed: 3. Time elapsed: 0.0s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 1, local: 0)
BUILD FAILED
Failed to build 'root//frontend:assets_pack (prelude//platforms:default#524f8da68ea2a374)'

λ rm frontend/public/icons

λ buck2 build frontend:assets_pack
File changed: root//tmp/work/upload/9328a678-7e97-4f98-b572-dfee848bb396
File changed: root//tmp/work/upload/3b446ffe-23ae-4c2f-8801-517b05c8079e
File changed: root//tmp/cas/7c7d23fb-0fed-41da-990f-08cc16686099
42 additional file change events
Build ID: a8233c76-c3f7-408f-8265-908114ccec49
Network: (GRPC-SESSION-ID)
Jobs completed: 12. Time elapsed: 3.8s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 1, local: 0)
BUILD SUCCEEDED

λ git restore frontend/public/icons 

λ buck2 build frontend:assets_pack
File changed: root//tmp/cas/1d2c4833-5d3a-4c32-b2e6-db13d3bce4e4
File changed: root//tmp/cas/cas/0a/0a735d55159999d4db9f3460b43d2e24e6116af22998f6f7aada76d7cfb36416
File changed: root//tmp/cas/e3f9ae6a-89be-40e1-8733-c48114f22217
1094 additional file change events
Build ID: 5b6ddadf-aa0a-41d5-a06a-d0044f8ff168
Network: (GRPC-SESSION-ID)
Jobs completed: 12. Time elapsed: 3.6s.
Cache hits: 0%. Commands: 1 (cached: 0, remote: 1, local: 0)
BUILD SUCCEEDED

Also, I noticed that the symlinks are not preserved in the buck-out/v2/gen/root/524f8da68ea2a374/frontend/__assest__ directory:

ls -lhd buck-out/v2/gen/root/524f8da68ea2a374/frontend/__assets__/assets/public/{webfonts,icons}
drwxr-xr-x 1 claudio users 1.2K Feb  8 10:46 buck-out/v2/gen/root/524f8da68ea2a374/frontend/__assets__/assets/public/icons
drwxr-xr-x 1 claudio users  718 Feb  8 10:46 buck-out/v2/gen/root/524f8da68ea2a374/frontend/__assets__/assets/public/webfonts

Is this to be expected?

BTW, I am using buck2 aa5cc9e36218b3afcad06608d91f9e8baa1d5c88e0b2a2f561b1b695a320afc7, the 2024-01-02 pre-release.

cc: @aherrmann

So I don't know if this is the cause of this specific error, but symlinks in sources are basically completely unsupported. They sometimes kind of work, but you're in basically untested ground here so I'm not surprised that something has broken.

We do support symlinks in outputs, depending on what exactly it is you're doing it's possible that that may offer a path to working around this limitation

Thanks for your quick response!

So I don't know if this is the cause of this specific error, but buck2 doesn't support symlinks in sources are basically completely unsupported. They sometimes kind of work, but you're in basically untested ground here so I'm not surprised that something has broken.

OK, fair enough. In this case it seems to indicate a problem with handling the http2 responses gracefully. I would guess that the upstream server "sees" that some files are already uploaded and replies with cancelling the stream, which just should be ignored perhaps?!

We do support symlinks in outputs, depending on what exactly it is you're doing it's possible that that may offer a path to working around this limitation

The same problem turns up when we use the output of yarn install as input to another action running remotely. My current workaround is to create a tarball inside a local action, and then explicitly unpack the tarball before doing the real work of the action.

The same problem turns up when we use the output of yarn install as input to another action running remotely. My current workaround is to create a tarball inside a local action, and then explicitly unpack the tarball before doing the real work of the action.

Oof, yeah this sounds very likely to be a bug. This probably requires figuring out exactly what the buck2-RE communication looks like and which of the two is out of spec (guessing that it's us is a good default). I think we probably have logging that can help with that?

I turned up logging for grpc calls in the bazel-remote-worker (it's explicitly disabled in code in order to avoid "Received DATA frame for an unknown stream 1521" error messages) and got this:

240229 08:43:39.770:WT 17 [io.grpc.netty.NettyServerStream$TransportState.deframeFailed] Exception processing message
io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 4194304: 4594008
        at io.grpc.Status.asRuntimeException(Status.java:526)
        at io.grpc.internal.MessageDeframer.processHeader(MessageDeframer.java:391)
        at io.grpc.internal.MessageDeframer.deliver(MessageDeframer.java:271)
        at io.grpc.internal.MessageDeframer.request(MessageDeframer.java:161)
        at io.grpc.internal.AbstractStream$TransportState$1RequestRunnable.run(AbstractStream.java:236)
        at io.grpc.netty.NettyServerStream$TransportState$1.run(NettyServerStream.java:202)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)

The maximum batch size is set to 4 * 1000 * 1000. I don't know why the message is so much larger than that limit. Maybe it's because symlinks are involved?

The maximum batch size is set to 4 * 1000 * 1000. I don't know why the message is so much larger than that limit. Maybe it's because symlinks are involved?

Oh, it's probably just because of the overhead. For each datum, there are at least 72 extra Bytes needed to transmit the hash and the compressor enum value. For one example request that I looked at, there were 7410 entries in one batch; which already adds up to 533520 Bytes. Plus a few Bytes needed for encoding every element of the requests field.

We have increased the max inbound message size for the bazel-remote-worker to an arbitrarily high number in order to workaround that issue and have not seen this error again.

Closing, since I think nothing is to be done here.