Fixed Keys in Serverside callback
marcstern14 opened this issue · comments
Hi @emilhe, ServerSide callbacks have been a life saver! I'm wondering about using fixed keys. I've managed to get it running in a local test environment, but have some questions about how it might work in a prod enviroment.
- When new data is stored in the dcc.Store using the same fixed key, does it overwrite the data that is currently there, or something else? Can this effectively act as backend cleanup?
- How does this affect client side DataFrame filtering, if at all?
- Are there other tradeoffs using fixed key methods that relate to performance or infinitely file_system_backend folders?
Unfortunately, I can't share any code as it is a work-related project. But I can share my general setup, which includes checking to see if there is a new file and if so, reading it into a dcc.Store, along with the refresh date in a separate dcc.Store. The dataframe and refresh are stored on the serverside, while the users interact with it on the client side, applying filtering which creates charts (it's a dashboard).
Thank you!
Update: I have been using fixed keys in order to prevent infinite FileSystemBackend growth, and it seem to be working. However, there is one buggy behavior with this method that I'm not understanding, which is that sometimes, but not always the data does not load into the app. I am checking to see if there is a new data file to be read on launch, and storing the data in a dcc.Store(storage_type='session')
so it should load the data on every new app launch. I can provide the following code below if there are any thoughts on what might be the issue and potential solutions. Thanks!
app = DashProxy(
__name__,
external_stylesheets=external_stylesheets,
suppress_callback_exceptions=True,
use_pages=True,
assets_folder=assets,
pages_folder=pages,
transforms=[ServersideOutputTransform()]
)
server = app.server
app.layout = html.Div(
[
dcc.Store(id='data1', storage_type='session'),
dcc.Store(id='data2', storage_type='session'),
dcc.Store(id='refresh', storage_type='session')
]
)
@app.callback(
Output('data1', 'data'),
Output('data2', 'data'),
Output('refresh', 'data'),
Input('refresh', 'data'),
Input('data1', 'data'),
Input('data2', 'data')
)
def get_data(refresh, df1, df2):
bucket_name = 'my_bucket'
blob_name = 'data.parquet'
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.get_blob(blob_name)
blob_updated = blob.updated
if (refresh
and pd.to_datetime(refresh, utc=True) >= pd.to_datetime(blob_updated, utc=True)
and df1 is not None
and df2 is not None
): raise PreventUpdate
# read data from GCS
df = pd.read_parquet(f'gs://{bucket_name}/{blob_name}')
df2 = pd.read_parquet(f'gs://{bucket_name}/data_file2.parquet')
# use fixed data key for cache storage
key1 = 'AAA'
key2 = 'BBB'
key3 = 'CCC'
return Serverside(df1, key=key1), Serverside(blob_updated, key=key2), Serverside(df2, key=key3)
Hi @marcstern14 wondering if you got this working because it looks pretty good except for the order of the return values. Did you check if the refresh value was actually falsy?
Hi @Lxstr, I did get this working and it seems to be a viable solution. My issue actually didn't have to do with using fixed keys, rather with using multiple pods in deployment with this type of filesystem caching, since each pod has its own filesystem that may or may not have loaded data separately from how it is expecting to interact with the client-side cache. See here