[Feature request] Hybrid debugging

Question

[Feature request] Hybrid debugging

Esadruhn opened this issue 4 years ago · comments

Thaïs de Boisfossé commented 4 years ago

Description

Create a "debugging" mode when using the Substra SDK. The principle is that the user only needs to change one parameter in his script to switch between the "normal" and "debugging" mode.

In this mode:

adding an asset adds it locally
fetching an asset fetches it either from the deployed platform or locally (local assets are identified by a 'key' that starts with local_ or a similar prefix or suffix)
executing a traintuple on a dataset fetched remotely has the following behaviour:
- it counts the number of train_samples_keys of the dataset
- it gets the same number of samples from the fake method of the opener
- it executes on the fake data
the list function lists the remote and local assets
the leaderboard function gets the leaderboard from the deployed platform if the objective is on it, otherwise from the local data

Limitation

A locally created traintuple, testtuple, aggregate tuple or composite traintuple cannot depend on a remote tuple.
Same for a compute plan: it is impossible to add a tuple to a compute plan that is on the deployed platform.

Improvements

It must be clear to the user that executing a local tuple on a remote asset means that it uses the fake data.
On the contrary, if the tuples executes on a local asset (the user created a dataset and data samples locally) then the real data is used.

--> the documentation should be clear about this
--> add a warning / print in the stdout when it uses the fake data ?

Code example

dataset_key = EXISTING_KEY
remote_traintuple_key = EXISTING_TRAINTUPLE_KEY

c = Client(<url>, <version>, debugging=True)
dataset = c.get_dataset(dataset_key)
data_samples = dataset['trainDataSampleKeys']

# Can also get the objective and algo from the deployed platform
c.add_objective(<objective>)
c.add_algo(<algo>)

c.add_traintuple({
        'train_data_sample_keys': [d['key'] for d in data_samples]
    })  # the traintuple executes on len(data_samples) fake data

# This fails:
c.add_traintuple({
        'in_models_keys': [remote_traintuple_key]
    })

Nat · Answer 1 · Wed Jul 22 2020 00:55:01 GMT+0800 (China Standard Time)

This would be awesome!

Thaïs de Boisfossé · Answer 2 · Thu Aug 13 2020 20:26:20 GMT+0800 (China Standard Time)

Closed with PR #206