data lifecycles in modern analysis and efficient tiered caching policies
lgray opened this issue · comments
Let's write down and categorize all the types of data we consume and generate during the lifecycle of an analysis including including ML training and other ancillary tasks. Perhaps some general rules of thumb w.r.t. cache lifetime policies or a way to end up with some policies becomes apparent.
This becomes much more interestingly useful when you consider tier-less dataset definitions (i.e. object store) and the cycle on and off of tape in such a setup.
+1