Substra / substra

Low-level Python library used to interact with a Substra network

Home Page:https://docs.substra.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature request] Hybrid debugging

Esadruhn opened this issue · comments

Description

Create a "debugging" mode when using the Substra SDK. The principle is that the user only needs to change one parameter in his script to switch between the "normal" and "debugging" mode.

In this mode:

  • adding an asset adds it locally
  • fetching an asset fetches it either from the deployed platform or locally (local assets are identified by a 'key' that starts with local_ or a similar prefix or suffix)
  • executing a traintuple on a dataset fetched remotely has the following behaviour:
    • it counts the number of train_samples_keys of the dataset
    • it gets the same number of samples from the fake method of the opener
    • it executes on the fake data
  • the list function lists the remote and local assets
  • the leaderboard function gets the leaderboard from the deployed platform if the objective is on it, otherwise from the local data

Limitation

A locally created traintuple, testtuple, aggregate tuple or composite traintuple cannot depend on a remote tuple.
Same for a compute plan: it is impossible to add a tuple to a compute plan that is on the deployed platform.

Improvements

It must be clear to the user that executing a local tuple on a remote asset means that it uses the fake data.
On the contrary, if the tuples executes on a local asset (the user created a dataset and data samples locally) then the real data is used.

--> the documentation should be clear about this
--> add a warning / print in the stdout when it uses the fake data ?

Code example

dataset_key = EXISTING_KEY
remote_traintuple_key = EXISTING_TRAINTUPLE_KEY

c = Client(<url>, <version>, debugging=True)
dataset = c.get_dataset(dataset_key)
data_samples = dataset['trainDataSampleKeys']

# Can also get the objective and algo from the deployed platform
c.add_objective(<objective>)
c.add_algo(<algo>)

c.add_traintuple({
        'train_data_sample_keys': [d['key'] for d in data_samples]
    })  # the traintuple executes on len(data_samples) fake data

# This fails:
c.add_traintuple({
        'in_models_keys': [remote_traintuple_key]
    }) 
commented

This would be awesome!

Closed with PR #206