Local Prototyping UDF (Debugging)

Question

Local Prototyping UDF (Debugging)

przell opened this issue 3 years ago · comments

Title	Local Prototyping UDF (Debugging)
Date	2021-11-18
Issue	#94
Category	Debugging
Description	OpenEO UDFs allow the user to run arbitrary R code within an openEO process graph. In order to debug, parametrize and validate the function that is sent to an backend the user needs the possiblity to test the function locally. Ideally the user can retreive a subset of the data with the same dimensionality that arrives in the UDF service for local prototyping.
Dependencies	openEO API definition
Links	Local Backend for testing (#88)
Priority	High
Impact	High

przell · Answer 1 · Tue Nov 30 2021 21:55:10 GMT+0800 (China Standard Time)

With new approach to UDFs (bridge to python).
Idea:
Process Graph... run_udf(, debug = TRUE), return stars object as .Rdata, needs to be saved in user_workspace or returned via synchronous call.

Florian Lahn · Answer 2 · Fri Feb 11 2022 18:57:36 GMT+0800 (China Standard Time)

As discussed internally the retrieveal of sample data is crucial for local prototyping. Therefore we need a function that allows the user to retrieve those data.

There we have different realization choices and face some problems:

configurability: the user defines the process graph or we simply give some options for properties
running the job: sync vs. async
size: how large can the sample data get
result retrieval: in genereal not a problem, but what happens if there are auxillary files that ship metadata
results interpretation: not a problem for a single time instance image, but how is time propagated correctly and coherently amongst back-ends when downloading a serialized raster time series, also maybe band - all that information should resolve into a stars object with which the user can "play-around"
data format: different back-ends will most definetely offer different file formats which will structure relevant dimensional meta data differently

The result interpretation bit might also be relevant for #39 and the immediate creation of a stars object. Unless there is a convenient and well-defined way of doing this, this will cause problems, because every back-end provides the data differently, which results in having different data representations in R which do not properly reflect the data structure in the back-end. @m-mohr For now results must be described as STAC elements. But for serializing raster time series or images with multiple bands, there is no recommended way of describing it, right?

Florian Lahn · Answer 3 · Fri Feb 18 2022 18:12:52 GMT+0800 (China Standard Time)

At this point we are not able to get the exact data that is injected into the UDF, because

there is not user filesystem at the moment (2022-02-18)
each back-end chunks the data differently to achieve best performance

As an intermediate solution we can retrieve sample data before a UDF shall be run and the user can the experiment with the data that is returned in a convenient way (probably a stars object that will also be used inside the R-UDF).

Maybe as an addition to the list before:
7. due to the complexity of the users UDF function the processing time can be very slow depending on whether the processing has to be done for each element or it vectorized functions can be used

Florian Lahn · Answer 4 · Fri Mar 11 2022 01:19:24 GMT+0800 (China Standard Time)

A first version is now available in the develop branch. You can now get a sample with get_sample(). A vignette with some examples will follow.