Update manifest to track pipeline stages.
alxmrs opened this issue · comments
In order to assist with solving #139, manifests need to be restructured to update download state. This is helpful when work is split up across a few transforms. Currently DownloadStatus
only records the state of the whole download.
After discussing with @mahrsee1997, we propose adding a stage
flag to the DownloadStatus
object. This will keep track of the Transform that the operation is currently in. Each stage can have in-progress
, success
or failure
statuses.
This will eventually help the user better understand the fine-grained state of the pipeline. Further, it can better illustrate what went wrong in the pipeline and were.
Implementation Notes
In addition to updating the DownloadStatus
schema, the abstract Manifest
also needs to be updated. A concise way to allow manifest users to express the stage would be to add an argument to the transact
method:
After the change, the API should allow the manifest user to specify what stage the pipeline was on. Then, the manifest should update the status
es and the stage
s in the normal method, via a context manager (i.e. a with
statement).
Advanced note: Since the Beam stage name comes from a predicable place, it might be worth investigating if the manifest object can acquire the name of the Transform without having it be passed in as a string (is there a __name__
that can be grabbed from some exposed object)? This is optional, of course. We can omit this if the approach is too complex.