GFDRR / rdl-standard

The Risk Data Library Standard (RDLS) is an open data standard to make it easier to work with disaster and climate risk data. It provides a common description of the data used and produced in risk assessments, including hazard, exposure, vulnerability, and modelled loss, or impact, data.

Home Page:https://docs.riskdatalibrary.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Schema] Exposure costs and metrics

duncandewhurst opened this issue · comments

From GFDRR/rdls-spreadsheet-template#3 (comment):

  1. Why is the exposure_cost sheet populated? If I understood correctly, the dataset doesn't describe the cost of buildings, it only describes their area.

At the moment, the "cost" of exposure is limited to monetary currencies. But the value represented by an exposure dataset might be intangible, or just a proxy to later calculate the economic value; in my experience it is actually pretty uncommon to use an exposure dataset that already comes into economic terms. In this specific case it is a value of built-up area over total pixel area. In other cases, the value could be building height, or volume, population density or others. A range of different metrics could be represented by exposure, in order to measure the cost.

I see two options:

1. Put `cost` field as optional, use it only if actually a currency value. Don't specify exposure metric.

2. Add exposure `metric` field as open codelist

@matamadio do analysts need to know the exposure metric at the point of selecting a dataset?

@matamadio do analysts need to know the exposure metric at the point of selecting a dataset?

To me it is one of the most key information to provide for exposure; similarly to hazard imt. It doesn't need to be within "cost" array, it can be at top level as exposure/metric

This sounds as though we need a new object in addition to exposure.cost, e.g.

"metrics": {
          "title": "Asset metrics",
          "type": "array",
          "description": "The non-monetary exposure metrics associated with specific elements of assets detailed in the dataset. If a metric is measured exclusively in monetary values use `cost`.",
          "items": {
            "$ref": "#/$defs/Metric"
          },
          "minItems": 1,
          "uniqueItems": true
        }

where Metric is

"Metric": {
      "title": "Asset metric",
      "type": "object",
      "description": "The metric associated with specific elements of assets detailed in the dataset.",
      "required": [
        "id",
        "type",
        "unit"
      ],
      "properties": {
        "id": {
          "title": "Identifier",
          "type": "string",
          "description": "A locally unique identifier for this metric.",
          "minLength": 1
        },
        "type": {
          "title": "Metric type",
          "description": "The type of the metric, from the closed [cost type codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#cost_type).",
          "type": "string",
          "codelist": "cost_type.csv",
          "openCodelist": false,
          "enum": [
            "structure",
            "content",
            "product",
            "disruption"
          ]
        },
        "unit": {
          "title": "Metric unit",
          "type": "string",
          "description": "The unit in which the metric is specified, from the open [impact_unit codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#impact_unit.",
          "codelist": "impact_unit.csv",
          "openCodelist": true
        }
      },
      "minProperties": 1
    }

Is this object likely to be potentially needed anywhere else? If not it doesn't need to be in $defs and can just go straight into exposure.

I think though if we go with this we'll need to revise some of the codelist names, rename 'cost_type.csv' to 'asset_type.csv' and rename 'impact_unit.csv' to 'metric_unit.csv'. My logic for the second of these is that impact is a specific type of metric but happy to have alternative names for this one suggested. Or alternatively @matamadio @stufraser1 is 'impact_unit.csv' not appropriate for this exposure metric and do we need an entirely new codelist for this field?

Thanks for the proposal; I made a counterproposal splitting metric into 2 arrays:

Exposure

  • category
  • taxonomy
  • metric
    • monetary (cost)
      • type (as is)
      • unit (as is) - separate from vulnerability/cost
    • non-monetary
      • type (new codelist)
      • unit (new codelist)

If this makes sense:

  • rename cost as monetary (also codelist monetary_type.csv and monetary_unit.csv)
    • full list of currencies as unit is ok, but realistically we would need something like USD (year), PPP (year), and similar comparable units
  • add new array non-monetary and associated type and unit open codelists (nonmonetary_type.csv and nonmonetary_unit.csv)
    • nonmonetary_type.csv same as monetary_type.csv with the inclusion of "population"
    • nonmonetary_unit.csv as open codelist, existing values:
      • Area (extent)
      • Count
      • Density
      • Time (period)
      • ...more to add
    • when cost type = disruption, user might need to quantify it in terms of production time rather the monetary

Thanks, both. I'll have a think about modelling options.

full list of currencies as unit is ok, but realistically we would need something like USD (year), PPP (year), and similar comparable units

  1. Does PPP stand for purchasing power parities in this context? If so, PPPs seem more like conversion rates than units. Can you share a link to a dataset in which the exposure metric is expressed in purchasing power parities?
  2. Are you suggesting that we add a field for the value date of the monetary amounts in a dataset?
  • nonmonetary_unit.csv as open codelist, existing values:

    • Area (extent)
    • Count
    • Density
    • Time (period)
    • ...more to add

As discussed in #75 (comment), these are quantity kinds rather than units. Units would be things like square metres (for area quantities) or hours (for time quantities). I agree that it is more useful to model quantity kinds than specific units, since it should be possible to convert between units within a quantity kind (e.g. hours to minutes), but not between units of different quantity times (e.g. square metres to hours). I would name this field accordingly (quantityKind) and base it on a subset of the QUDT quantity kinds vocabulary, which already has codes for Area, Count, Density, Time and Currency.

Can you share a link to a dataset in which the exposure metric is expressed as a quantity of density? I'm assuming you don't mean the QUDT definition of density, which is mass per unit volume so it would be good to work out what the correct quantity kind is.

At the moment, the "cost" of exposure is limited to monetary currencies. But the value represented by an exposure dataset might be intangible, or just a proxy to later calculate the economic value

Agree - number of buildings / number of people / km of roads (e.g. per grid cell) are commonly used as well as total value (replacement cost / insured value) per grid cell or per building.

in my experience it is actually pretty uncommon to use an exposure dataset that already comes into economic terms.

It is common in national level datasets and some global datasets, but maybe not in the ones used in examples so far. See Central Asia datasets, Africa R5, GEM's global exposure model, as just a few examples. It is also the case as stated that the value might be area or length or count.

To me it is one of the most key information to provide for exposure

Agree -- cost type or (monetary/non-monetary) value of the exposure needs to be readily visible in metadata.

Can you share a link to a dataset in which the exposure metric is expressed as a quantity of density? I'm assuming you don't mean the QUDT definition of density, which is mass per unit volume

This refers more to population density - relating number in a given geographic area. 'Count' would cover this - number of building / population, which would be given in the data as a count per raster grid cell. I haven't yet seen an exposure dataset with the value given as 'no. buildings per km2'.

I think the suggestion from @matamadio works to make it clearer that we can include monetary and non-monetary values and the latter should include Area and Count. I don't think we need Time/Duration here as a metric. In my experience exposure isn't ever given a time value. We might estimate the disruption time as a loss, or (for insurance datasets only) identify an insured value for business interruption for a building, but we wouldn't record a unit of time in the exposure dataset - I can't think of an example where a road or building would be attributed a time value - it wouldn't mean anything practically.

I would request that the data structure allows one or more of count, area AND cost to be included in the same dataset - I can point to examples where the cost is derived from one or both of area and count, and all pieces of data are included in the final data.

Does PPP stand for purchasing power parities in this context? If so, PPPs seem more like conversion rates than units. Can you share a link to a dataset in which the exposure metric is expressed in purchasing power parities?

Yes, sometimes costs are expressed as PPP of local currency into USD. Anyway, not strictly necessary.

I would request that the data structure allows one or more of count, area AND cost to be included in the same dataset - I can point to examples where the cost is derived from one or both of area and count, and all pieces of data are included in the final data.

Agree on this solution.

Great so combining all of this we could remove exposure.cost and replace it with exposure.metrics which would be an object holding 2 arrays, one of monetary metrics and one of non-monetary metrics. This would allow for multiple metrics to be included for a single dataset. We could keep using Cost as the monetary items (which is good as we still use Cost in Loss as well) and add an additional $defs/Metric for the non-monetary metric items.

{
  "metrics": {
    "title": "Asset metrics",
    "type": "object",
    "description": "The metrics associated with specific elements of assets detailed in the dataset.",
    "properties": {
      "monetary": {
        "title": "Monetary asset metrics",
        "type": "array",
        "description": "The monetary exposure metrics associated with specific elements of assets detailed in the dataset.",
        "items": {
          "$ref": "#/$defs/Cost"
        },
        "minItems": 1,
        "uniqueItems": true
      },
      "non_monetary": {
        "title": "Non-monetary asset metrics",
        "type": "array",
        "description": "The non-monetary exposure metrics associated with specific elements of assets detailed in the dataset.",
        "items": {
          "$ref": "#/$defs/Metric"
        },
        "minItems": 1,
        "uniqueItems": true
      }
    },
    "minProperties": 1
  }
}
{
  "$defs":{
    "Metric": {
      "title": "Asset metric",
      "type": "object",
      "description": "The metric associated with specific elements of assets detailed in the dataset.",
      "required": [
        "id",
        "type",
        "quantity_kind"
      ],
      "properties": {
        "id": {
          "title": "Identifier",
          "type": "string",
          "description": "A locally unique identifier for this metric.",
          "minLength": 1
        },
        "type": {
          "title": "Metric type",
          "description": "The type of the asset, from the closed [cost type codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#cost_type).",
          "type": "string",
          "codelist": "cost_type.csv",
          "openCodelist": false,
          "enum": [
            "structure",
            "content",
            "product",
            "disruption"
          ]
        },
        "quantity_kind": {
          "title": "Quantity kind",
          "type": "string",
          "description": "The kind of quantity in which the metric is specified, from the open [quantity kind codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#quantity_kind.",
          "codelist": "quantity_kind.csv",
          "openCodelist": true
        }
      },
      "minProperties": 1
    }
  }
}

with the quantity kind codes taken as the most relevant selection in the QUDT quantity kinds vocabulary

Code Title
area Area
count Count
length Length

One issue with this is that as it stands there isn't a way of expressing that a metric is relating to a population. Does it need to be added to the cost_type codelist? And would it make sense to rename this codelist to metric_type?

Thanks Jen. Agree on the solution, including renaming as metric_type and including population.

It seems to me that there are more similarities than differences between monetary and non-monetary metrics so I would lean towards having a single metrics array.

What are the advantages of separating monetary and non-monetary metrics in the data model instead of having a single metrics array and using the quantity_kind field (with 'Currency' as an option) as a discriminator?

From a general usability point of view, there are some advantages to having a single metrics array: fewer sheets in the spreadsheet representation and I would've thought it would be easier for users to see all of the metrics in a dataset in a single list/table/sheet than to have them split into separate lists.

However, happy to hear if there is a risk-specific reason for separating them!

This does seem easier to use and communicate range of metrics and I don't think there is a need to have them in two lists/array

Ok for single array grouping based on quantity_kind