Fixed Keys in Serverside callback

Question

Fixed Keys in Serverside callback

marcstern14 opened this issue 5 months ago · comments

Hi @emilhe, ServerSide callbacks have been a life saver! I'm wondering about using fixed keys. I've managed to get it running in a local test environment, but have some questions about how it might work in a prod enviroment.

When new data is stored in the dcc.Store using the same fixed key, does it overwrite the data that is currently there, or something else? Can this effectively act as backend cleanup?
How does this affect client side DataFrame filtering, if at all?
Are there other tradeoffs using fixed key methods that relate to performance or infinitely file_system_backend folders?

Unfortunately, I can't share any code as it is a work-related project. But I can share my general setup, which includes checking to see if there is a new file and if so, reading it into a dcc.Store, along with the refresh date in a separate dcc.Store. The dataframe and refresh are stored on the serverside, while the users interact with it on the client side, applying filtering which creates charts (it's a dashboard).

Thank you!

Marc Stern · Answer 1 · Tue May 07 2024 02:58:19 GMT+0800 (China Standard Time)

Update: I have been using fixed keys in order to prevent infinite FileSystemBackend growth, and it seem to be working. However, there is one buggy behavior with this method that I'm not understanding, which is that sometimes, but not always the data does not load into the app. I am checking to see if there is a new data file to be read on launch, and storing the data in a dcc.Store(storage_type='session') so it should load the data on every new app launch. I can provide the following code below if there are any thoughts on what might be the issue and potential solutions. Thanks!

app = DashProxy(
    __name__,
    external_stylesheets=external_stylesheets,
    suppress_callback_exceptions=True,
    use_pages=True,
    assets_folder=assets,
    pages_folder=pages,
    transforms=[ServersideOutputTransform()]
)

server = app.server

app.layout = html.Div(
    [
        dcc.Store(id='data1', storage_type='session'),
        dcc.Store(id='data2', storage_type='session'),
        dcc.Store(id='refresh', storage_type='session')
    ]
)

@app.callback(
    Output('data1', 'data'),
    Output('data2', 'data'),
    Output('refresh', 'data'),
    Input('refresh', 'data'),
    Input('data1', 'data'),
    Input('data2', 'data')
)
def get_data(refresh, df1, df2):
    bucket_name = 'my_bucket'
    blob_name = 'data.parquet'
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.get_blob(blob_name)
    blob_updated = blob.updated

    if (refresh
        and pd.to_datetime(refresh, utc=True) >= pd.to_datetime(blob_updated, utc=True)
        and df1 is not None
        and df2 is not None
    ): raise PreventUpdate

    # read data from GCS
    df = pd.read_parquet(f'gs://{bucket_name}/{blob_name}')
    df2 = pd.read_parquet(f'gs://{bucket_name}/data_file2.parquet')

    # use fixed data key for cache storage
    key1 = 'AAA'
    key2 = 'BBB'
    key3 = 'CCC'

    return Serverside(df1, key=key1), Serverside(blob_updated, key=key2), Serverside(df2, key=key3)

Lex · Answer 2 · Thu Jul 11 2024 16:27:31 GMT+0800 (China Standard Time)

Hi @marcstern14 wondering if you got this working because it looks pretty good except for the order of the return values. Did you check if the refresh value was actually falsy?

Marc Stern · Answer 3 · Sun Jul 14 2024 12:51:53 GMT+0800 (China Standard Time)

Hi @Lxstr, I did get this working and it seems to be a viable solution. My issue actually didn't have to do with using fixed keys, rather with using multiple pods in deployment with this type of filesystem caching, since each pod has its own filesystem that may or may not have loaded data separately from how it is expecting to interact with the client-side cache. See here