nasa / opera-sds-pcm

Observational Products for End-Users from Remote Sensing Analysis (OPERA)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: reported DSWx-S1 Production Times are too short

sjlewis-jpl opened this issue · comments

Checked for duplicates

Yes - I've already checked

Describe the bug

When I did generated a Production Time report from a test of DSWx-S1 FWD processing, I noticed that the production times were all much shorter than the durations of the PGE jobs that created the products. When looking in more detail, the Input Received Datetime seems like the one in error. For all 1488 products generated in that test, all of them were later than the production time in the filename (this suggests that the inputs for the PGE job were received after the output products were generated).

I attached some files to this ticket to help show what I am seeing.

  • The file production-time-detailed - 2024-03-13T000000 to 2024-03-13T235959.xlsx is the detailed production time report. On the worksheet Prod. Time vs Input Rec. Time are 3 columns of red text showing the comparison of production time and Input Received Datetime.
  • The file production time bug report.xlsx has three worksheets looking at a single output product. The first sheet has the entry from the detailed production time report, the second sheet has the row from querying GRQ for the output product, and the third sheet has the rows of input products used to generate the output.
    • For this example, the Input Received Datetime is listed as 2024-03-13T01:46:09.125352 (compared to the production time of 20240313T014508Z in the filename).
    • In GRQ (for the output product) that same time can be seen in the field _source.metadata.ProductReceivedTime.
    • In GRQ (for the inputs), the latest download time for the input products is 2024-03-13T01:14:45.936043
    • I could not find 2024-03-13T01:46:09.125352 in any of the datetime fields in the DB for retrieved files.

It seems like whatever logic populates the Input Received Datetime is where the problem lies.

The values in _source.metadata.ProductReceivedTime are all within a few minutes of the output product's production time (max difference is 2 minutes 39 seconds, min time is 3 seconds). It seems like this might be the time PCM receives the output product from the PGE? That sounds like a reasonable thing for that field to contain. Somewhere there needs to be a field that can collect the timestamp for the last input product's download time, which is what should be used for the Production Time reports.

Edit: Further investigation shows that this issue is endemic in all of our data products.

production-time-detailed - 2024-03-13T000000 to 2024-03-13T235959.xlsx
production time bug report.xlsx

What did you expect?

I expected the production times to be longer than the PGE runtimes. I expected the Input Received Datetime to match the latest download time of the input products to the PGE job.

Reproducible steps

1. Generate some products within an SDS instantiation (any product will do, this issue impacts them all).
2. Once finished, generate the Production Time report via the Bach UI.
3. Compare the production times against the PGE runtimes.  Compare the `Input Received Datetime` column against the production times in the product filename.

Environment

- Version:  3.0.0-rc3.0
- Venue:  INT-FWD

Unfortunately, this issue has been with us since R1. I attached a production time report from OPS from January 1st of this year illustrating the issue.

production-time-detailed - 2024-01-01T000000 to 2024-01-02T000000.xlsx