HSF / PyHEP.dev-workshops

PyHEP Developer workshops

Home Page:https://indico.cern.ch/e/PyHEP2023.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data lifecycles in modern analysis and efficient tiered caching policies

lgray opened this issue · comments

Let's write down and categorize all the types of data we consume and generate during the lifecycle of an analysis including including ML training and other ancillary tasks. Perhaps some general rules of thumb w.r.t. cache lifetime policies or a way to end up with some policies becomes apparent.

This becomes much more interestingly useful when you consider tier-less dataset definitions (i.e. object store) and the cycle on and off of tape in such a setup.