google / weather-tools

Tools to make weather data accessible and useful.

Home Page:https://weather-tools.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update manifest to track pipeline stages.

alxmrs opened this issue · comments

In order to assist with solving #139, manifests need to be restructured to update download state. This is helpful when work is split up across a few transforms. Currently DownloadStatus only records the state of the whole download.

"""Download status: 'scheduled', 'in-progress', 'success', or 'failure'."""

After discussing with @mahrsee1997, we propose adding a stage flag to the DownloadStatus object. This will keep track of the Transform that the operation is currently in. Each stage can have in-progress, success or failure statuses.

This will eventually help the user better understand the fine-grained state of the pipeline. Further, it can better illustrate what went wrong in the pipeline and were.

Implementation Notes

In addition to updating the DownloadStatus schema, the abstract Manifest also needs to be updated. A concise way to allow manifest users to express the stage would be to add an argument to the transact method:

def transact(self, selection: t.Dict, location: str, user: str) -> 'Manifest':

After the change, the API should allow the manifest user to specify what stage the pipeline was on. Then, the manifest should update the statuses and the stages in the normal method, via a context manager (i.e. a with statement).

Advanced note: Since the Beam stage name comes from a predicable place, it might be worth investigating if the manifest object can acquire the name of the Transform without having it be passed in as a string (is there a __name__ that can be grabbed from some exposed object)? This is optional, of course. We can omit this if the approach is too complex.