[Feature request] Hybrid debugging
Esadruhn opened this issue · comments
Description
Create a "debugging" mode when using the Substra SDK. The principle is that the user only needs to change one parameter in his script to switch between the "normal" and "debugging" mode.
In this mode:
- adding an asset adds it locally
- fetching an asset fetches it either from the deployed platform or locally (local assets are identified by a 'key' that starts with
local_
or a similar prefix or suffix) - executing a traintuple on a dataset fetched remotely has the following behaviour:
- it counts the number of
train_samples_keys
of the dataset - it gets the same number of samples from the
fake
method of the opener - it executes on the fake data
- it counts the number of
- the
list
function lists the remote and local assets - the
leaderboard
function gets the leaderboard from the deployed platform if the objective is on it, otherwise from the local data
Limitation
A locally created traintuple, testtuple, aggregate tuple or composite traintuple cannot depend on a remote tuple.
Same for a compute plan: it is impossible to add a tuple to a compute plan that is on the deployed platform.
Improvements
It must be clear to the user that executing a local tuple on a remote asset means that it uses the fake data.
On the contrary, if the tuples executes on a local asset (the user created a dataset and data samples locally) then the real data is used.
--> the documentation should be clear about this
--> add a warning / print in the stdout when it uses the fake data ?
Code example
dataset_key = EXISTING_KEY
remote_traintuple_key = EXISTING_TRAINTUPLE_KEY
c = Client(<url>, <version>, debugging=True)
dataset = c.get_dataset(dataset_key)
data_samples = dataset['trainDataSampleKeys']
# Can also get the objective and algo from the deployed platform
c.add_objective(<objective>)
c.add_algo(<algo>)
c.add_traintuple({
'train_data_sample_keys': [d['key'] for d in data_samples]
}) # the traintuple executes on len(data_samples) fake data
# This fails:
c.add_traintuple({
'in_models_keys': [remote_traintuple_key]
})
This would be awesome!
Closed with PR #206