oceanmodeling / StormEvents

Python interfaces for observational data surrounding named storm events, born from @jreniel's ADCIRCpy

Home Page:https://stormevents.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Format Requirement for PaHM and ADCIRC for `BEST` and `OFCL`

SorooshMani-NOAA opened this issue · comments

See noaa-ocs-modeling/EnsemblePerturbation#99.

There seems to be different contradicting requirements for track inputs for ADCIRC vs PaHM. The ones currently identified are

  • The importance of the advisory type (e.g BEST vs OFCL)
  • The importance of date tags vs forecast hours for different advisory types
    • BEST track has increasing date tags from ATCF but 0 for all forecast hours
      • ADCIRC requires the forecast hours to be increasing (like a normal advisory)
      • PaHM just looks at the date tags, so no changes required to forecast hours
    • OFCL and other non-BEST tracks have fixed date tags for each issued advisory (i.e. track start time), but have increasing forecast hours
      • ADCIRC is fine with this format since it looks at the forecast hours
      • PaHM requires modified date tags based on the forecast hour, since it only cares about the time tag.

These changes potentially require a separate file extension for PaHM vs ADCIRC compliant input files (i.e. stormevents outputs) as discussed in noaa-ocs-modeling/EnsemblePerturbation#98 (comment)

@pvelissariou1 it seems there's a difference between how ADCIRC processes track files vs what PaHM does. @WPringle and I would like to first confirm this before separating the logic of handling both in stormevents and ensembleperturbation.

My understanding based on our prior discussions is that in PaHM, the advisory type (e.g. BEST vs OFCL vs CARQ, etc.) and forecast hour (e.g. 0, 6, 12, 18, ...) don't matter, and all that the model looks at is date tag column for figuring out time. And other than that PaHM cares about the isotach speed column and the rest of the physical variables, such as velocity , direction, pressures, radii.

Please also look at the issue description above and let me know if I'm missing anything.

@WPringle I just had a meeting with Takis and it seems I didn't have the full picture for PaHM. Later he will paste the link to locations in PaHM where the ATCF files are processed, as well as the ADCIRC code that does the same. Let's then review those and decide how to move forward with the stormevent handling of different cases.

@pvelissariou1 @WPringle
I'll try to summarize my concerns and understanding here, including #81 (i.e. explicit normalization):

  • PaHM and ADCIRC have the same requirements in terms of input track file format and needed columns.
  • In either ADCIRC or PaHM, the GAHM model does not check for OFCL vs BEST vs anything, it just reads the tracks and the track rows must be unique in time (3 isotachs per time)
    • In ADCIRC for GAHM the forecast hours must be increasing (?)
  • In either ADCIRC or PaHM for the Symmetric Holland (SymH) model, the input data is processed differently for BEST vs OFCL:
    • For BEST the tracks are filtered and only the first entry for unique dates str is kept.
    • For OFCL only hour-0 or the first available entry for every date (track start date) is taken, the rest are filtered

stormevents must do the following normalizations in any case:

  • Remove duplicate inputs (such as the ones for Ian best track file from NHC)
  • Calculate missing values for forecast tracks:
    • Background pressure [currently done implicitly]
    • Eye pressure [currently done implicitly]
    • Radius of maximum winds [currently done implicitly]
    • Forecast hour must be increasing for BEST track [currently done implicitly]
    • Speed [currently done implicitly]
    • Direction [currently done implicitly]
    • Radius of the last closed isobar [missing]
    • Radius for quadrants for GAHM [missing]

In terms of our use cases, we have the following items, along with what it means for stormevents output:

  1. Simulate BEST track for hindcast
    • The duplicate removal (normalization) stated in #81 should be added
    • The forecast dates need to be increasing for it to work with GAHM (at least in ADCIRC) -- it seems that's not true see #84 (comment)
  2. Simulate forecast track for incoming storms (NHC collaboration)
    • OFCL track data from a single advisory needs to be used.
    • The OFCL track needs to be modified to add missing fields
    • The OFCL date and forecast hour need to be modified to feed it to the model. This means:
      • For a single advisory, the first entry's date stays the same, the rest will need to increment by the forecast hour.
      • The forecast hours need to be increasing (remain as they are).
      • This should work for both SymH and GAHM, because SymH gets the first entry for each date if 0 does not exists for that date, and GAHM only cares about forecast time after making sure dates are unique
  3. Simulate past-forecast track for probabilistic analysis method implementation and testing
    • Requirements are similar to item 2.

Please note that:

  • Reading back the modified tracks into stromevents requires the user to tell stormevents that it's a modified track, either by file extension or by explicit argument.
  • Ideally we do not want to have two different outputs from stormevents for GAHM vs SymH models

I'd like to also note that @WPringle has told me that the processing of OFCL in ADCIRC GAHM only cares about the forecast hours, and not the date tags, and I'm not sure if PaHM is the same; so please let me know if my understanding is incorrect.

Please let me know if any part of this is incorrect or anything is missing so that I update the comment! Also please let me know if any part of this is unclear. I want to make sure we're all on the same page before making any major changes!

For OFCL only hour-0 or the first available entry for every date (track start date) is taken, the rest are filtered
In both models OFCL filtering is not performed at this point

Radius of the last closed isobar [missing]
We should add the capability to estimate the radius; methods to do so exist but we need to understand how thay do it at NHC

Radius for quadrants for GAHM [missing]
The radii of quadrant wind intensities need to be estimated when their values are missing (again need to check with NHC on how they estimate these values)

The forecast dates need to be increasing for it to work with GAHM (at least in ADCIRC)
PaHM checks the timestamps and re-orders them in ascending order if needed

OFCL track data from a single advisory needs to be used.
The OFCL time intervals are not constant, using the OFCI field that contains the interpolated values for every 6 hours may be a more useful approach.

Link to format documentation: ATCF Formats

@pvelissariou1 thank you for the information. I just have some follow up questions/points:

In both models OFCL filtering is not performed at this point

Isn't this line doing the filtering in PaHM for SymH (not GAHM)?
Lines 1024-1026

          IF (iCnt > 1) THEN
            IF ( (strOut%fcstInc(iCnt) /= 0) .AND. (strOut%fcstInc(iCnt) == strOut%fcstInc(iCnt - 1))) CYCLE
          END IF

Also in response to

The OFCL time intervals are not constant, using the OFCI field ...

Right now for SymH only BEST and OFCL are accepted, right? So if we use OFCI we can only use it with GAHM, isn't it so?

And for

The forecast dates need to be increasing for it to work with GAHM (at least in ADCIRC)

I meant to say we cannot have all 0 forecast hours (even with different date tags) for BEST track in ADCIRC. Is it the same in PaHM? In other words the following doesn't work for ADCIRC GAHM I think:

AL, 09, 2022092600,   , BEST,   0, 168N,  809W,  50,  991, TS,  34, NEQ,   60,   60,    0,   30, 1007,  120,  30,  60,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   30,    0,    0,    0, genesis-num, 028, 
AL, 09, 2022092600,   , BEST,   0, 168N,  809W,  50,  991, TS,  50, NEQ,   30,    0,    0,    0, 1007,  120,  30,  60,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   30,    0,    0,    0, genesis-num, 028, 
AL, 09, 2022092606,   , BEST,   0, 177N,  817W,  65,  985, HU,  34, NEQ,   70,   70,   30,   70, 1007,  150,  15,  80,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   45,   30,   30,   30, genesis-num, 028, 
AL, 09, 2022092606,   , BEST,   0, 177N,  817W,  65,  985, HU,  50, NEQ,   30,   30,    0,   20, 1007,  150,  15,  80,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   45,   30,   30,   30, genesis-num, 028, 
AL, 09, 2022092606,   , BEST,   0, 177N,  817W,  65,  985, HU,  64, NEQ,   15,    0,    0,    0, 1007,  150,  15,  80,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   45,   30,   30,   30, genesis-num, 028, 
AL, 09, 2022092612,   , BEST,   0, 187N,  824W,  70,  981, HU,  34, NEQ,   90,   80,   40,   90, 1008,  150,  15,  85,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   60,   30,   30,   60, genesis-num, 028, 
AL, 09, 2022092612,   , BEST,   0, 187N,  824W,  70,  981, HU,  50, NEQ,   40,   40,    0,   30, 1008,  150,  15,  85,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   60,   30,   30,   60, genesis-num, 028, 
AL, 09, 2022092612,   , BEST,   0, 187N,  824W,  70,  981, HU,  64, NEQ,   20,    0,    0,    0, 1008,  150,  15,  85,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   60,   30,   30,   60, genesis-num, 028, 
AL, 09, 2022092618,   , BEST,   0, 197N,  830W,  80,  976, HU,  34, NEQ,  100,   90,   60,   90, 1008,  150,  20, 100,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,  120,   60,   30,   75, genesis-num, 028, 
AL, 09, 2022092618,   , BEST,   0, 197N,  830W,  80,  976, HU,  50, NEQ,   50,   50,   20,   30, 1008,  150,  20, 100,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,  120,   60,   30,   75, genesis-num, 028, 
AL, 09, 2022092618,   , BEST,   0, 197N,  830W,  80,  976, HU,  64, NEQ,   30,   25,    0,   20, 1008,  150,  20, 100,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,  120,   60,   30,   75, genesis-num, 028, 
AL, 09, 2022092700,   , BEST,   0, 208N,  833W,  85,  965, HU,  34, NEQ,  100,   90,   60,   90, 1006,  130,  20, 105,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,  150,  120,   60,   90, genesis-num, 028, 

instead it should be like:

AL, 09, 2022092600,   , BEST,   0, 168N,  809W,  50,  991, TS,  34, NEQ,   60,   60,    0,   30, 1007,  120,  30,  60,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   30,    0,    0,    0, genesis-num, 028, 
AL, 09, 2022092600,   , BEST,   0, 168N,  809W,  50,  991, TS,  50, NEQ,   30,    0,    0,    0, 1007,  120,  30,  60,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   30,    0,    0,    0, genesis-num, 028, 
AL, 09, 2022092606,   , BEST,   6, 177N,  817W,  65,  985, HU,  34, NEQ,   70,   70,   30,   70, 1007,  150,  15,  80,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   45,   30,   30,   30, genesis-num, 028, 
AL, 09, 2022092606,   , BEST,   6, 177N,  817W,  65,  985, HU,  50, NEQ,   30,   30,    0,   20, 1007,  150,  15,  80,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   45,   30,   30,   30, genesis-num, 028, 
AL, 09, 2022092606,   , BEST,   6, 177N,  817W,  65,  985, HU,  64, NEQ,   15,    0,    0,    0, 1007,  150,  15,  80,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   45,   30,   30,   30, genesis-num, 028, 
AL, 09, 2022092612,   , BEST,   12, 187N,  824W,  70,  981, HU,  34, NEQ,   90,   80,   40,   90, 1008,  150,  15,  85,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   60,   30,   30,   60, genesis-num, 028, 
AL, 09, 2022092612,   , BEST,   12, 187N,  824W,  70,  981, HU,  50, NEQ,   40,   40,    0,   30, 1008,  150,  15,  85,   0,   L,   0,    ,   0,   0,        IAN, D, 12, NEQ,   60,   30,   30,   60, genesis-num, 028, 
AL, 09, 2022092612,   , BEST,   12, 187N,  824W,  70,  981, HU,  64, NEQ,   20,    0,    0,    0, 1008,  150,  15,  85,   0,   L,   0,    ,   0,   0,       
...

@WPringle please see above. It seems that we don't need the forecast hours for GAHM on BEST track.

@SorooshMani-NOAA
For everything except ADCIRC GAHM (original NHC format):

  • Actual time should be validation time + forecast time (and OFCL vs BEST is not important, but BEST forecast time is always 0).

For ADCIRC GAHM:

  • Even for BEST update forecast time based on the validation time difference from the first row because that module does not read datetimes and assumes start of simulation is at the "0" forecast time.

@WPringle, @pvelissariou1 as we discussed in the meeting, we'd like to focus on theses three columns datetime tag, track type, and forecast hour, i.e. 3rd, 5th and 6th (e.g. 20180901, OFCL, 24) from the NHC track file (ATCF). The main concern here is what the expected track input is for PaHM and ADCIRC for either of the GAHM or Holland models. There are also missing forecast fields that we'll address later.

Note that we're not talking about column spacing or value types or orders, we already assume it's the same for all cases. The question is about the values within each column in relation to other columns.

So far, this is my understanding of where things stand based on all that was discussed:

  • ADCIRC and PaHM can work with the same input track files
  • In either ADCIRC or PaHM OFCL vs BEST track types do not matter
  • Both ADCIRC and PaHM's symmetric Holland implementation requires datetime tags to be increasing (for every given isotach)
  • Both ADCIRC and PaHM's GAHM implementation requires forecast hours to be increasing (for every given isotach)

This means that the output expected from storm events should be different from original NHC file, which means (with focus on the 3 aforementioned columns):

  • stormevents output file should always have increasing datetag (to work with symmetric)
  • stormevents output file should always have increasing forecast hour (to work with GAHM)
  • stormevents output file should contain 1 track, at least this should be an option
  • stormevents output file has equal time intervals between track entries (interpolation) -- is this optional or necessary for PaHM to work @pvelissariou1?
  • The same stormevents output file should work for ADCIRC and PaHM, for either GAHM or symmetric
  • When stormevents reads track files, there should be a way to distinguish original vs modified track (e.g. by extension)
  • Based on all above, it shouldn't matter what the track type is, so it could be either OFCL or BEST without any change to any of the other columns (among the 3) as long as all other items above are true

These output requirements mean:

  • If input is BEST track, the output file is different in that its forecast hours are not 0 in the output
  • If input is OFCL track, the output file is different because the datetime tags are increasing as opposed to constant in original single track

@WPringle @pvelissariou1, please confirm that these statements are valid, then I'll go ahead and make the changes.

@SorooshMani-NOAA No, for OFCL we do not require the datetime tags to be increasing. They can stay as is. Only change is for best-track GAHM for ADCIRC where forecast hours needs to be present.

I think I misunderstood:

For everything except ADCIRC GAHM (original NHC format):

  • Actual time should be validation time + forecast time

What do you mean by "actual time" then?

Also does it matter if for non-GAHM model the forecast hours are not all 0s?

"actual time" means the time that the values are supposed to be representative of, sorry perhaps in that I mean validation time.

The datetime tag represents the time that the forecast is made, or rather the analysis time.

In ADCIRC the symmetric model actually reads the datetimes and forecast hours correctly. It is only the GAHM model which for some reason they ignored the datetimes and only read forecast hours assuming that 0 is the start time of the simulation

@WPringle I just remembered @pvelissariou1 said (if I remember correctly) that for OFCL, PaHM only picks up hour-0 of each advisory, hence the need to fake best track for the simulation. Based on what you're saying that's different in ADCIRC, right? So it's not all the same between PaHM and ADCIRC?

@pvelissariou1 @SorooshMani-NOAA That's true that modified format would work for both Holland and GAHM. So for BEST we can just use that format as the fort.22, for OFCL I would advise against updating the datetime column.

@SorooshMani-NOAA @pvelissariou1 need to be careful to tag the correct Soroosh 🤣

So, let's just stick to the standard NHC format which PaHM can handle no problems (the .dat). We only require the preprocessing for ADCIRC GAHM best-track as described by Takis, which is either done internally by PaHM or can be done by stormevents to pass to ADCIRC when output with fort.22.

@pvelissariou1 I think GitHub has issues with in-email reply, let's stick with on-website replies to avoid issues! In any case, let me think about it a bit. I'm more and more convinced that maybe we should remove all implicit conversions and just have methods/functions like normalize_for_adcirc or normalize_for_pahm that user needs to explicitly call before writing to output.

Since that has implications on how ensembleperturbation works, I need to make sure any change is either reflected downstream too.

@SorooshMani-NOAA I like that idea Soroosh, but what I got from Takis is that I don't think we need a normalize_for_pahm.

I would have the normalize_for_adcirc as an optional kwarg for the write function,

Just to be concrete, so these should work for both GAHM and Holland for both PaHM and ADCIRC without any issues:

  • Modified BEST
AL, 06, 2018083006,  0, BEST,   6, ...
AL, 06, 2018083012,  0, BEST,  12, ...
AL, 06, 2018083018,  0, BEST,  18, ...
AL, 06, 2018083100,  0, BEST,  24, ...
AL, 06, 2018083106,  0, BEST,  30, ...
AL, 06, 2018083112,  0, BEST,  36, ...
AL, 06, 2018083118,  0, BEST,  42, ...
AL, 06, 2018090100,  0, BEST,  48, ...
AL, 06, 2018090106,  0, BEST,  54, ...
AL, 06, 2018090112,  0, BEST,  60, ...
AL, 06, 2018090118,  0, BEST,  66, ...
AL, 06, 2018090200,  0, BEST,  72, ...
  • Unmodified OFCL
AL, 08, 2018090712, 03, OFCL,   0, ..., 34, ...
AL, 08, 2018090712, 03, OFCL,   3, ..., 34, ...
AL, 08, 2018090712, 03, OFCL,  12, ..., 34, ...
AL, 08, 2018090712, 03, OFCL,  24, ..., 34, ...
AL, 08, 2018090712, 03, OFCL,  36, ...,  34, ...
AL, 08, 2018090712, 03, OFCL,  48, ..., 34, ...
AL, 08, 2018090712, 03, OFCL,  48, ..., 50, ...
AL, 08, 2018090712, 03, OFCL,  72, ..., 34, ...
AL, 08, 2018090712, 03, OFCL,  72, ..., 50, ...
AL, 08, 2018090712, 03, OFCL,  96, ..., 34, ...
AL, 08, 2018090712, 03, OFCL, 120, ..., 34, ...
AL, 08, 2018090712, 03, OFCL, 144, ..., 34, ...
AL, 08, 2018090712, 03, OFCL, 168, ..., 34, ...

@SorooshMani-NOAA Looks good. I suppose we should actually test this all in PaHM and ADCIRC.

See noaa-ocs-modeling/PaHM#27 (comment) for formatting related issues for OFCL tracks.

Issue related to missing fields: noaa-ocs-modeling/PaHM#29

In SCHISM, the GAHM model has been modified slightly to use the radius of the last closed isobar (RRP) to reduce the amount of calculations in the domain (similar to the Holland model) by eliminating the nodal points outside RRP (I disagree with this approach but this is for future discussion with the SCHISM developers).

In our case, in the OFCL track files all the RRPs are set to zero, hence the problem with the OFCL files. My suggestion for a temporary workaround is to replace RRP by the max(radius1, radius2, radius3, radius4) of the 34 isotach.

Also see #86