opengeospatial / ogcapi-processes

Home Page:https://ogcapi.ogc.org/processes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Default execution mode should be asynchronous.

pvretano opened this issue · comments

Requirements 26 item C says: "The server SHALL respond synchronously if, according to the job control options in the process description, the process can be executed in either mode."

I think this is the UNSAFE play. We should change this to ASYNCHRONOUS by default.

@pvretano UNSAFE play?

Disagreeing with this, for the reasons we discussed at length previously.

If I recall correctly, the main reason we opted for synchronous by default is that the fact that the client provides a Prefer: header indicates that the client is able to handle async execution.

The RFC for the Prefer: header does not include a mechanism to indicate sync execution (there is only respond-async).

Changing how this works would definitely be a major breaking change between the two versions, which I hope we would avoid or it will really be a transition nightmare.

The OGC API - Processes async mode is significantly more complicated to implement from both the server and the client perspective.

From the server side, it requires implementing a complex queuing mechanism, results storage and persistence, etc.
From the client side, it requires implementing complex polling and status checking.

Sync execution on the other hand is super simple. It's handling / making a POST requests stating what the client wishes to execute and handling the response. It is much more similar to the typical OGC API GET requests. I believe it's actually easier to implement a relatively safer sync execution than the async equivalent.

An async service can be implemented as an additional layer on top of a simple sync service.

Async / batch processing is still way overrated in my opinion.
I still believe on-demand / small requests for AoI/ToI/RoI (e.g., "Collection output"), where these small requests can easily be synchronous, will largely mostly replace batch processing, though it's going to take more time for people to accept this paradigm shift in geoprocessing. This is easier to manage from the both the client and the server side.

The sync mode is the simple thing. I hope we can keep the default the simple way :)

@jerstlouis requirement 26, part C covers the case where NO perfer headers is specified AND the process can be executed either sync or async. In this specific case I think the better default is async. Otherwise you risk a long running process timing out the HTTP connection.

@pvretano The reason we had requirement 26 C:

The server SHALL respond synchronously if, according to the job control options in the process description, the process can be executed in either mode.

which is equivalent to 25 C in the approved/published 1.0 version is because it would otherwise be impossible for a client to tell the server "I want to execute this process synchronously".

Omitting the Prefer: header is how the processing clients can currently do that with the approved standard.

Without this, it would be impossible to write synchronous-only execution clients that can work with servers that also offer optional async support. Synchronous-only clients are much easier to write and get working, from past code sprints and testbed experience.

Otherwise you risk a long running process timing out the HTTP connection.

In my opinion, this is a non-issue.
From my understanding (and a quick search), there is no fundamental HTTP time out.

The server can decide how long it wants to keep processing something.
The client can decide it wants to keep waiting for the response.

At any point, both ends can decide to give up and interrupt the connection.

This is actually a good thing in terms of avoiding to hog up the resources from the server standpoint, because the server actually knows that there is an actual client still waiting patiently for a response or not, and can easily limit the number of active connections from a particular client (which is much simpler than managing a processing queue, and can be done with readily-available tools like an Apache server proxy).

With async, a malicious client could just decide to queue up tons of processing requests and never care about them.

Interesting releated comment on Reddit:

The year is 2522. The war rages on and the robots are gaining ground each day. Also, we're still waiting for json to get returned from an http request made 500 years ago.

If a process is inherently always going to take a long time and the server doesn't want to have a long time out, then it can choose to only offer the async option.

If a process sometimes takes a small amount of time and sometimes a very long amount of time based on the execution request, then the server can estimate how long the processing would take, and refuse to process it synchronously in the case it takes too long, with a 400 error that says to the client:

This execution request would time out before it could complete.
Please include a Prefer: header in your execution request to indicate that you can handle asynchronous requests.

I believe this is the best solution to address your concern.

SWG meeting from 2024-05-27: We keep it as it is (synchronous), as otherwise there would be no way for a client to tell servers that synchronous execution is requested.

@pvretano @jerstlouis @bpross-52n

Sorry I could not be part of today's meeting, but I strongly disagree with this decision and do not think this issue should be closed.

Previous iteration of the standard had a lot of considerations about letting the server decide what is appropriate between sync/async (at some point, even the order of the supported execution modes in the process description defined the default). Sure, the synchronous implementation might seem easier to implement from server/client side when dealing with "simple" processing like NDVI or applying a AoI/ToI/RoI filter to a collection, but it makes absolutely NO sense when the processing is much longer or to handle limited-resource/high-demand situations, since servers can close the HTTP connection due to timeouts or need to wait for the resource to become available in a queuing system. In such cases, the async operation is much more appropriate. I could argue in my typical use cases that async is easier to implement since a monitoring Job is always desired... and leaving a connection open indefinitely is a misuse of resources, which most servers will avoid by defaulting between ~5/60s timeout according to use cases.

Note that I am not advocating for async by default either. I believe this flexibility of execution mode is an advantage from the processes API.

I would also like to remind that, according to Prefer header, it is a preference by definition. The server is always allowed to completely ignore this requirement if it deems it inappropriate for the operation to perform, as long as it indicates accordingly if the Preference-Applied was applied or not in the response. Therefore, even if the Prefer was specified, the only real way for a client to validate whether the preferred sync/async was used is to confirm with Preference-Applied. Furthermore, sync/async are supposed to return 200/201 for the respective modes in the case of OGC API - Processes, so this could also be used to validate further. However, regardless of applied execution strategy, ANY client implementation should always be ready to handle either sync/async implementation, since the server is allowed to select the most appropriate one for the operation or available resources. Deviating from this would mean OGC API - Processes deviates from the RFC, which is misleading with the use of Prefer altogether. OGC API - Processes should simply omit any indication of any default.

I also disagree with the statement:

The RFC for the Prefer: header does not include a mechanism to indicate sync execution (there is only respond-async).

If a server allows it, it is perfectly valid to indicate Prefer: wait=100000000000000000. That would pretty much indicate to run sync indefinitely. I would question a server allowing and maintaining this unreasonable timeout, but it would be valid according to the HTTP specification. That would not solve the mocking "waiting forever analogy" either.

If a process sometimes takes a small amount of time and sometimes a very long amount of time based on the execution request, then the server can estimate how long the processing would take, and refuse to process it synchronously in the case it takes too long, with a 400 error

Sometimes, the operations are not themselves long, but the resources to compute them are insufficient to respond to the demand. Sometimes, there is simply no way to estimate that duration, as it depends on external factors the server does not have access to, such as the time of a deployed process the server cannot know how long it will take to complete. In some cases, the server still wants to respect the submission order of the requests, and not only depend on luck for when an execution request will succeed or not. Believing that async is used only for long-running jobs is a vast oversimplification of the use cases.

An async service can be implemented as an additional layer on top of a simple sync service.

Async / batch processing is still way overrated in my opinion.

I still believe on-demand / small requests for AoI/ToI/RoI (e.g., "Collection output"), where these small requests can easily be synchronous, will largely mostly replace batch processing, though it's going to take more time for people to accept this paradigm shift in geoprocessing. This is easier to manage from the both the client and the server side.

I would like to make sure OGC API - Processes does not evolve according to this kind of mentality. If the operations is "so simple" that it can be performed by a simple GET request for a one-off operation, maybe it is an indication that a dedicated endpoint for that relevant operation should be something else than a OGC API - Processes, since it does not really justify the whole Process Description overhead. IMO, implicating a Process Description and potentially the creation of a Job operation to monitor it implies that the operation is "complicated enough" to need its own standalone definition rather than a generic OpenAPI endpoint schema. In many cases, these "complicated processes" could run with particular requirements that perfectly justifies async, but that could also support sync execution if requirements were available at the time the request was submitted.

OGC API - Processes should never disregard these use cases, especially with the increasing demand for AI/ML algorithms implicating ever increasingly complex operations, varying demands and resource requirements.

@fmigneault

Thanks a lot for engaging in this discussion. I enjoy our thoughtful discussions, even though others might find them too long to read ;)

However, regardless of applied execution strategy, ANY client implementation should always be ready to handle either sync/async implementation, since the server is allowed to select the most appropriate one for the operation or available resources.

This is where I disagree, because I believe there is value in developers being able to quickly put together an OGC API - Processes client that supports only sync, for those servers/processes that do support sync only requests.

If a server allows it, it is perfectly valid to indicate Prefer: wait=100000000000000000. That would pretty much indicate to run sync indefinitely.

The difference is that the server is not obligated to respond sync in this case, unlike when the Prefer: header is omitted altogether.

Deviating from this would mean OGC API - Processes deviates from the RFC, which is misleading with the use of Prefer altogether.

The way we kind of avoided deviating from the RFC in Processes 1.0, is that when the client does not use the Prefer: header, the RFC is not involved, and the default (if the process supports sync execution) is synchronous execution.

leaving a connection open indefinitely is a misuse of resources, which most servers will avoid by defaulting between ~5/60s timeout according to use cases.

these "complicated processes" could run with particular requirements that perfectly justifies async, but that could also support sync execution if requirements were available at the time the request was submitted.

The way I would suggest to address this, is an explicit permission for the server to refuse synchronous execution (execution request submitted without a Prefer: header) with a 400 and a notice to the client to include one, indicating that it is ready to accept an async response, as I was suggesting above. Or alternatively, a 503 telling the client to try again later if this is the different case that resources for sync execution are not currently available, or 413 that the request is for too much (still with the hint that the server might be able to accept it as an asynchronous request right now).

If the operations is "so simple" that it can be performed by a simple GET request for a one-off operation, maybe it is an indication that a dedicated endpoint for that relevant operation should be something else than a OGC API - Processes

The idea with collection output (which very much in line with the concept of GeoDataCubes), is that you can describe the workflow once, and the follow-on requests for a particular AoI/ToI/RoI are really just a simple get requests (e.g., using OGC API - Coverages, Tiles, DGGS, Features, Maps, EDR...). The description of the workflow can be of any level of complexity, and may combine several inputs and chains of other workflows/processes, whether local or distributed, so this does justify the use of OGC API - Processes. But the requests for partial results are very simple and potentially also very quick to process due to their limited ATRoI.

implicating a Process Description
it implies that the operation is "complicated enough" to need its own standalone definition rather than a generic OpenAPI endpoint schema.

I hope we can prototype example OpenAPI version of the Process description as an alternative/complement to the OGC process description. Regardless of how complicated the process is, the inputs and outputs need to be described and I think that is what both of those approaches do.

and potentially the creation of a Job operation to monitor it implies that the operation is "complicated enough"

The need for job monitors is when we cannot avoid lengthy batch processes.
There are certainly use cases for these, but if it there is value in sometimes executing that same process synchronously for a small dataset and the process is localizable, then this likely fits in the ATRoI scenario and a server deciding to use that approach can do away without a monitoring system by implementing only Part 1 sync + Part 3 collection output (as we currently do on our demo server). This probably does not fit AI/ML training processes, but might well fit inference processes.

Guys, you are killing me with these LONG, LONG comments! ;)

This particular issue is about a client POSTing an execution request without an accompanying Prefer header. The specification, via Req 26C, currently says that the server should respond synchronously.

I was proposing that the server should respond ASYNC since, without knowning how long the process might take to run, ASYNC was the safer approach.

@jerstlouis is proposing to leave it as is.

@fmigneault, I think, is saying remove this requirement altogether and instead let the server ALWAYS decide the execution mode base on its internal knowlege of the process, the execution enviroment, the available resources, etc. The client can express a PREFERENCE via the Prefer header but the server is not compelled to satisfy that preference. The client must also always be prepared to handle either scenario (sync or async) execution. A combination of Preference-Applied and the HTTP status code (200 or 201) returned will always inform the client as to which action the server took.

So do I remove Req26? I'm leaning towards yes.

This, of course, then begs the question ... do we need to bother with job control metadata in the process description at all?

@pvretano I really believe we need to keep the requirement as-is.

I'm proposing to address @fmigneault 's use case by adding a permission clarifying that the server MAY return a 503 if it's not able to handle the request synchronously right now due to limited resources or a 413 if it's not able to due to the client asking to process too much synchronously, including a verbose hint that the client should re-submit the request with a Prefer: header.

@jerstlouis I think @fmigneault proposal is simpler and pretty much what Req27 says right now. In the case where the process can run sync or async the server picks and makes its decision known via the Preference-Applied (and/or HTTP 200/201) header. Easy peasy.

As I said above:

This is where I disagree, because I believe there is value in developers being able to quickly put together an OGC API - Processes client that supports only sync, for those servers/processes that do support sync only requests.

This loses that, and breaks compatibility with 1.0.

The difference is that the server is not obligated to respond sync in this case, unlike when the Prefer: header is omitted altogether.

I also still believe we should aim for as much compatibility with 1.0 as possible, given that so far everything is very, very close to full compatibility in terms of existing 1.0 clients being able to execute processes from 1.1/2.0 servers.

@jerstlouis assuming we keep the job control options in the process description (see my previous question to the group) then you can still write such a simple OAProc client. This client simply needs to search for processes in the process list that can only run synchronously and ignore all the other ones. No?

As for compatability, nice to have but not 100% necessary since we are targeting 2.0 ... no?

@pvretano

This client simply needs to search for processes in the process list that can only run synchronously and ignore all the other ones. No?

Currently, Processes 1.0 also makes this possible for processes that support both sync & async, this change limits this to those that support only sync (and the server adding support for async to those processes later will suddenly breaks those sync-only clients).

As for compatibility, nice to have but not 100% necessary since we are targeting 2.0 ... no?

I was still hopeful that this could still be a 1.1 version, at least in terms of not breaking this existing compatibility, even if not reflected in the actual version number.

Exactly as @pvretano described. Perfectly understood by thought process.

Is there any way in OGC API - Processes that I can enforce synchronous execution and if it's not supported or to complex to run synchronously, it just returns an error?

Similarly, is this possible to asynchronous execution?

I'm thinking of two use cases here:

  • Rapid web visualization via synchronous execution (e.g. for web mapping, which just doesn't work effectively with batch jobs).
  • Creating large result sets e.g. a result as STAC catalog (I'd probably never want to parse that from a single HTTP response)

@m-mohr

Is there any way in OGC API - Processes that I can enforce synchronous execution

2 part answer depending on the version of the standard

Before

Yes, completely, using mode in the JSON execution body enforced the mode to respect. If it did not match one of the supported jobControlOptions indicated by the process description, you got an error. If it matched supported options, the mode specified had to be respected.

Current

Not guaranteed depending on the description. The same rule about mismatching jobControlOptions remains for returning an error. However, if a process supports both sync/async, it can technically ignore your Prefer header, as per that header's definition in HTTP, and fall back to any other mode if the server deems it could not respect the suggested preference. If it is respected, it MUST reply with Preference-Applied with matching values. Otherwise, it must omit that response header, and it up to you to deal with the response result (that you might not have expected).

I have raised this concern on multiple occasions, across multiple issues.


The only "real" way to enforce a mode currently while respecting all standard revisions and HTTP simultaneously, is to only indicate a single mode in jobControlOptions, whichever one you feel is more appropriate for a given process.

To clarify, the "Before" version with "mode" that @fmigneault is referring to was pre-1.0 and the "Current" version is 1.0 (which already replaced the use of "mode" by the Prefer: header, after much discussion).

According to 1.0, when NOT including the Prefer: header, if sync is an option, the server must execute synchronously (Requirement 25 C).

The reverse (both sync and async as an option for the process, with a Prefer: respond-async header) is not technically guaranteed due to the nature of the Prefer: header being only a preference, but it is a recommendation (12 A) which in practice should always be followed, because the server explicitly says that it supports async, and the client says that it wants to execute async. The server would need to have a good reason to not follow the client's preference.

The issue is that the server has the option to ignore the client's preference, meaning that the behavior cannot be predetermined in all circumstances. This makes the implementation of async handling much harder by clients. It is fine to have defaults being sync, since it is a simpler use case, but the async mode should be handled just as fairly and reliably, not twice as hard to achieve.

A "good reason" could be as simple as a server having a limited amount of resources to store jobs, for which it always tries to return the result synchronously when it could be executed fast enough to save space, but "is forced" to fall back to async when an input resource it must wait for cannot be ready in time for it to execute before server timeout was reached. Since it could respond in either way, the process description MUST indicate both modes in jobControlOptions to be compliant. However, there is no way to tell "why" a mode should be picked over another when Prefer is provided. From the point of view of a client, Prefer could sometimes be respected, and sometimes not, making the server appear unreliable or malfunctioning. The jobControlOptions is simply the only available indication that it could run either way, nothing more.

If a server was behaving this way, I could not blame the implementer that they do not follow the standard, as they technically be right. They would be allowed to ignore my preference. This makes it a bigger burden for my client integrating their server, as I must always try to deal with any possible outcome.

It sounds like the Prefer header is not the right solution for what is needed. Don't use it? Splitting the endpoints might be the better option, see #419

This makes it a bigger burden for my client integrating their server, as I must always try to deal with any possible outcome.

Yes, because it's only a recommendation, the client do technically need to be ready to handle sync responses as well, even when they submit a Prefer: response-async preference. However, handling async is quite complex by itself (e.g., polling, retrieving results separately), and the sync handling code can probably share a lot of code with the retrieving results part which the clients would need to do anyways.

Different end-points as @m-mohr suggested might have been a simpler solution side-stepping those problems, but possibly because some implementations also create jobs for synchronous execution, the SWG had not considered that at the time. Now I believe that the SWG mostly wants to avoid breaking changes as much as possible and finalize 1.1/2.0.

SWG mostly wants to avoid breaking changes as much as possible and finalize 1.1/2.0

💯 agreed. This is a strong requirement.

I only wished there was a way to "force" a certain mode for certain edge cases where it is critical that it is respected. In these few cases were sync/async is a "must" for whatever reason, I would actually prefer receiving an unprocessable request error or similar over generating a possibly long/heavy-resource execution that will not be handled accordingly.

I've actually just came across this: Prefer: handling=strict|lenient. Maybe something to consider to preserve the current Prefer behavior while allowing the one described above?