cloudflare / workerd

The JavaScript / Wasm runtime that powers Cloudflare Workers

Home Page:https://blog.cloudflare.com/workerd-open-source-workers-runtime/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request — Runtime APIs — Subrequest Counter

jaswrks opened this issue · comments

It would be very helpful if a subrequest counter value was exposed at runtime for long-running scheduled tasks performed by workers. A real-world example use case is an hourly CRON task running for up to 15 minutes, which tends to hit the stated subrequest limit across various fetch(), KV, D1, etc subrequests. It the runtime API exposed a counter, this scheduled task could detect when it is approaching a limit and stop.

Better controls and visibility around concurrent requests would be helpful but I'm curious about what may be better here. Is it really that you want to stop queuing subrequests or do you really want better handling of queued subrequests when the limit has been reached?

Thanks. In the context I shared above, I am more concerned about the overall limit of 1000 total subrequests under any single top-level request, which may have service bindings, etc. I'd like my worker scripts to be capable of determining when they are approaching the overall subrequest limit and stopping themselves before they reach this limit — therefore, access to a counter would be incredibly helpful in that regard.

Since subrequests can be triggered by runtime APIs (e.g., KV, D1, etc), I think it makes sense for the runtime to also provide access to a subrequest counter, because it is very difficult to achieve a custom counter; i.e., short of me setting up a Proxy instance for every conceivable runtime API within my worker script that would handle the counting of subrequests.

for ( ... ) {
   ... something, something.
  if (env.SUBREQUEST_COUNTER >= 975) break;
}

I suspect a counter would actually need to live within the incoming Request.cf properties somewhere, such that it can be request-specific, and not part of the environment variables.

ctx would be the right place to expose this, if anywhere. request.cf is a plain JS object so can't really contain a live counter. env can theoretically be per-request but strictly contains names defined by the developer, not by the platform.

Nice, ok. I can see this being a fairly quick API addition. Before working on it tho, I think we should give some (quick) thought to whether this is the only metric we may expose this way. If we think there might be more, then exposing it as a property on a container object might make sense, e.g. ctx.metrics.subrequests vs. ctx.subrequests, etc. /cc @irvinebroque for visibility.

I think it makes sense to expose both the "regular" subrequest count, and the internal subrequest count, as they are different counters with different limits (exposing the limit for these would probably also be helpful, but that would be a static number vs a counter). The simultaneous open connect count could also be useful, as would the cache API request counter. Additionally, people have been asking for a way to know how much memory and CPU time their worker has consumed, so even if this is not implemented right now it would make sense to put that on the same ctx.metrics object.

I don't think we'd be able to expose CPU and memory here. Exposing CPU time would have spectre concerns. Memory could also have side channels since GC is non-deterministic. Also, we don't currently have any way to measure the memory being used by a specific request, only the memory usage of the isolate as a whole, which may be handling multiple requests or may have handled past requests whose objects haven't been GC'd yet. So a memory metric could be confusing.

On another note, this needs a design doc with API and product review, and it's a busy time, so this may not be something we can create right away.

Memory for the isolate would still be useful, no? Workers aren't billed on memory(right now anyway), so since that is the metric used for evicting(iirc), then it would be useful to be able to save work before an eviction occurs?

Like, if I know the next task I'm going to do is going to require a fair bit of memory, I can check whether I have enough left. If not, I push the task to a queue, so I don't risk OOMing while processing.

(This would also be very useful with an API to be able to forcibly spawn a Worker in a separate isolate, maybe via a Service Binding)