[Bug]: reported DSWx-S1 Production Times are too short
sjlewis-jpl opened this issue · comments
Checked for duplicates
Yes - I've already checked
Describe the bug
When I did generated a Production Time report from a test of DSWx-S1 FWD processing, I noticed that the production times were all much shorter than the durations of the PGE jobs that created the products. When looking in more detail, the Input Received Datetime
seems like the one in error. For all 1488 products generated in that test, all of them were later than the production time in the filename (this suggests that the inputs for the PGE job were received after the output products were generated).
I attached some files to this ticket to help show what I am seeing.
- The file
production-time-detailed - 2024-03-13T000000 to 2024-03-13T235959.xlsx
is the detailed production time report. On the worksheetProd. Time vs Input Rec. Time
are 3 columns of red text showing the comparison of production time and Input Received Datetime. - The file
production time bug report.xlsx
has three worksheets looking at a single output product. The first sheet has the entry from the detailed production time report, the second sheet has the row from querying GRQ for the output product, and the third sheet has the rows of input products used to generate the output.- For this example, the
Input Received Datetime
is listed as2024-03-13T01:46:09.125352
(compared to the production time of20240313T014508Z
in the filename). - In GRQ (for the output product) that same time can be seen in the field
_source.metadata.ProductReceivedTime
. - In GRQ (for the inputs), the latest download time for the input products is
2024-03-13T01:14:45.936043
- I could not find
2024-03-13T01:46:09.125352
in any of the datetime fields in the DB for retrieved files.
- For this example, the
It seems like whatever logic populates the Input Received Datetime
is where the problem lies.
The values in _source.metadata.ProductReceivedTime
are all within a few minutes of the output product's production time (max difference is 2 minutes 39 seconds, min time is 3 seconds). It seems like this might be the time PCM receives the output product from the PGE? That sounds like a reasonable thing for that field to contain. Somewhere there needs to be a field that can collect the timestamp for the last input product's download time, which is what should be used for the Production Time reports.
Edit: Further investigation shows that this issue is endemic in all of our data products.
production-time-detailed - 2024-03-13T000000 to 2024-03-13T235959.xlsx
production time bug report.xlsx
What did you expect?
I expected the production times to be longer than the PGE runtimes. I expected the Input Received Datetime
to match the latest download time of the input products to the PGE job.
Reproducible steps
1. Generate some products within an SDS instantiation (any product will do, this issue impacts them all).
2. Once finished, generate the Production Time report via the Bach UI.
3. Compare the production times against the PGE runtimes. Compare the `Input Received Datetime` column against the production times in the product filename.
Environment
- Version: 3.0.0-rc3.0
- Venue: INT-FWD
Unfortunately, this issue has been with us since R1. I attached a production time report from OPS from January 1st of this year illustrating the issue.
production-time-detailed - 2024-01-01T000000 to 2024-01-02T000000.xlsx