ESGF / esg-search

ESGF Search Component

Home Page:http://esgf.org/esg-search/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CMIP6 "model cohort" search facet (and related obs4MIPs requirement)

taylor13 opened this issue · comments

To make sure this doesn't get forgotten, here is an email exchange from 2/23/17

Hi Karl,
yes you are correct, the same techniques can be used to update the catalogs for different metadata fields.

To go into a little more detail:

o Solr offers an API that allows metadata to be updated in place. Clients can use that API directly, or can use the ESGF API on top of that.

o In obs4MIPs, we plan to keep the “master copy” of the indicator on GitHub files (probably in Json format), which can be easily updated by the persons in charge. From there, some form of process will take care of downloading the latest version of those files, and publish the up-to-date information to Solr.

thanks, Luca

On Feb 23, 2017, at 9:57 AM, Karl Taylor taylor13@llnl.gov wrote:

Hi Luca and all (I've copied also Sasha who will be interested in this),

I understand that you will be discussing tomorrow how the obs4MIPs "Dataset Suitability & Maturity Indicators" might be implemented in ESGF to make them visible to users. As I understand it, there will be two new elements that will need to be introduced to ESG:

  1. A method to easily update the ESGF "dataset catalog information" (not sure how to technically refer to this) so that as the status of different indicators evolves (and after dataset publication), the updated status will be reflected by the "traffic signal" display of the 6 indicators (on search pages).

  2. A method to display the indicators on search result pages (i.e., the "traffic signal")

I wanted to remind Luca (and Sasha and myself) that updating the catalog (after publication of a dataset) is also needed for CMIP6. We are defining a search facet named "Model Cohort" which will record whether or not a model has completed all the CMIP6 "DECK" experiments (among some other things). The status of a model will change over time, and we will need to update the "Model Cohort" indicator. For this purpose, we can presumably rely on the same method we develop to record information about the maturity indices. Right?

best regards,
Karl

And some follow-up discussion from 2/24/17.

Hi Karl,
this is an example JSON that I proposed to Peter. The JSON can really be anything, as long as it contains the required information. And I agree that it does not have to be the exact same format as the CVs.

thanks, Luca

{
"id:obs4MIPs.NASA-JPL.AIRS.ta.mon.v1|esgf-dev.jpl.nasa.gov": {
"quality_control_flags": ["obs4mips_indicators:1:red",
"obs4mips_indicators:2:orange",
"obs4mips_indicators:3:yellow",
"obs4mips_indicators:4:green",
"obs4mips_indicators:5:gray",
"obs4mips_indicators:6:dark_gray"
]
},
"id:obs4MIPs.NASA-JPL.AIRS.hus.mon.v1|esgf-dev.jpl.nasa.gov": {
"quality_control_flags": ["obs4mips_indicators:1:green",
"obs4mips_indicators:2:green",
"obs4mips_indicators:3:green",
"obs4mips_indicators:4:green",
"obs4mips_indicators:5:yellow",
"obs4mips_indicators:6:light_gray"
]
}
}

On Feb 24, 2017, at 10:46 AM, Ames, Sasha ames4@llnl.gov wrote:

Hi Karl,

I agree that for CMIP6, the cohort facet management should be consistent with the way we already handle CV. We can work with that to ensure that the values are updated.
If obs4MIPs needs a different sort of JSON structure, I think its ok.

-Sasha

On 2/24/17, 9:31 AM, "Taylor, Karl E." taylor13@llnl.gov wrote:

Hi Luca and Sasha,

I listened in this morning on the conversation, but didn't hear anything
to cause any concern, so I didn't speak up.

I would like to to see what Luca is proposing as far as json file
structure.

For CMIP6 it would be most convenient to include the "cohort"
designation in the CMIP6_source_id.json file (see
https://github.com/WCRP-CMIP/CMIP6_CVs ). That was my plan. Please let
me know if there would be simpler way to do this.

thanks,
Karl

On 2/23/17 1:22 PM, Ames, Sasha wrote:

My suggestion:
It should be the responsibility of the party doing the original publication to update the metadata for theirs. Then, there needs to be communication from that greoup to the replica centers that there has been a change. We can automate the changes with scripts. Because there are access control rules we want to keep in place, I think it’s not feasible to make this fully automated,

-Sasha

On 2/23/17, 11:54 AM, "Taylor, Karl E." taylor13@llnl.gov wrote:

Thanks for the info.

For CMIP6, we will maintain and update reference CV's for various global
attributes (most of which are included in the output files, and some of
which also get recorded when the data are published). Of relevance to
the current thread, there is a CMIP6_source_id.json file that contains
the model "cohort"  description (see
https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/CMIP6_source_id.json
.  Currently for all but one model, the cohort is empty (i.e., ""), but
this will be changed soon to "Registered", indicating that the model has
successfully registered (but hasn't yet performed exeriments).  Once a
model has performed all the DECK experiments, the cohort will become
"DECK".

So, ideally, we'll be able to harvest the cohorts each model belongs to
from the .json file (which we'll update as needed), and associate that
information with each dataset.

One operational question: Since there will be many thousands of datasets
associated with each source_id, will this be a problem?

thanks again,
Karl