[Schema] Exposure costs and metrics

Question

[Schema] Exposure costs and metrics

duncandewhurst opened this issue a year ago · comments

From GFDRR/rdls-spreadsheet-template#3 (comment):

Why is the exposure_cost sheet populated? If I understood correctly, the dataset doesn't describe the cost of buildings, it only describes their area.

At the moment, the "cost" of exposure is limited to monetary currencies. But the value represented by an exposure dataset might be intangible, or just a proxy to later calculate the economic value; in my experience it is actually pretty uncommon to use an exposure dataset that already comes into economic terms. In this specific case it is a value of built-up area over total pixel area. In other cases, the value could be building height, or volume, population density or others. A range of different metrics could be represented by exposure, in order to measure the cost.

I see two options:
1. Put `cost` field as optional, use it only if actually a currency value. Don't specify exposure metric.

2. Add exposure `metric` field as open codelist

@matamadio do analysts need to know the exposure metric at the point of selecting a dataset?

Mattia Amadio · Answer 1 · Thu Aug 17 2023 17:46:57 GMT+0800 (China Standard Time)

@matamadio do analysts need to know the exposure metric at the point of selecting a dataset?

To me it is one of the most key information to provide for exposure; similarly to hazard imt. It doesn't need to be within "cost" array, it can be at top level as exposure/metric

Jen Harris · Answer 2 · Mon Aug 21 2023 18:12:01 GMT+0800 (China Standard Time)

This sounds as though we need a new object in addition to exposure.cost, e.g.

"metrics": {
          "title": "Asset metrics",
          "type": "array",
          "description": "The non-monetary exposure metrics associated with specific elements of assets detailed in the dataset. If a metric is measured exclusively in monetary values use `cost`.",
          "items": {
            "$ref": "#/$defs/Metric"
          },
          "minItems": 1,
          "uniqueItems": true
        }

where Metric is

"Metric": {
      "title": "Asset metric",
      "type": "object",
      "description": "The metric associated with specific elements of assets detailed in the dataset.",
      "required": [
        "id",
        "type",
        "unit"
      ],
      "properties": {
        "id": {
          "title": "Identifier",
          "type": "string",
          "description": "A locally unique identifier for this metric.",
          "minLength": 1
        },
        "type": {
          "title": "Metric type",
          "description": "The type of the metric, from the closed [cost type codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#cost_type).",
          "type": "string",
          "codelist": "cost_type.csv",
          "openCodelist": false,
          "enum": [
            "structure",
            "content",
            "product",
            "disruption"
          ]
        },
        "unit": {
          "title": "Metric unit",
          "type": "string",
          "description": "The unit in which the metric is specified, from the open [impact_unit codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#impact_unit.",
          "codelist": "impact_unit.csv",
          "openCodelist": true
        }
      },
      "minProperties": 1
    }

Is this object likely to be potentially needed anywhere else? If not it doesn't need to be in $defs and can just go straight into exposure.

I think though if we go with this we'll need to revise some of the codelist names, rename 'cost_type.csv' to 'asset_type.csv' and rename 'impact_unit.csv' to 'metric_unit.csv'. My logic for the second of these is that impact is a specific type of metric but happy to have alternative names for this one suggested. Or alternatively @matamadio @stufraser1 is 'impact_unit.csv' not appropriate for this exposure metric and do we need an entirely new codelist for this field?

Mattia Amadio · Answer 3 · Mon Aug 21 2023 23:23:50 GMT+0800 (China Standard Time)

Thanks for the proposal; I made a counterproposal splitting metric into 2 arrays:

Exposure

category
taxonomy
metric
- monetary (cost)
  - type (as is)
  - unit (as is) - separate from vulnerability/cost
- non-monetary
  - type (new codelist)
  - unit (new codelist)

If this makes sense:

rename cost as monetary (also codelist monetary_type.csv and monetary_unit.csv)
- full list of currencies as unit is ok, but realistically we would need something like USD (year), PPP (year), and similar comparable units
add new array non-monetary and associated type and unit open codelists (nonmonetary_type.csv and nonmonetary_unit.csv)
- nonmonetary_type.csv same as monetary_type.csv with the inclusion of "population"
- nonmonetary_unit.csv as open codelist, existing values:
  - Area (extent)
  - Count
  - Density
  - Time (period)
  - ...more to add
- when cost type = disruption, user might need to quantify it in terms of production time rather the monetary

Duncan Dewhurst · Answer 4 · Tue Aug 22 2023 08:46:46 GMT+0800 (China Standard Time)

Thanks, both. I'll have a think about modelling options.

Duncan Dewhurst · Answer 5 · Tue Aug 22 2023 11:12:03 GMT+0800 (China Standard Time)

full list of currencies as unit is ok, but realistically we would need something like USD (year), PPP (year), and similar comparable units

Does PPP stand for purchasing power parities in this context? If so, PPPs seem more like conversion rates than units. Can you share a link to a dataset in which the exposure metric is expressed in purchasing power parities?
Are you suggesting that we add a field for the value date of the monetary amounts in a dataset?

nonmonetary_unit.csv as open codelist, existing values:

Area (extent)

Count

Density

Time (period)

...more to add

As discussed in #75 (comment), these are quantity kinds rather than units. Units would be things like square metres (for area quantities) or hours (for time quantities). I agree that it is more useful to model quantity kinds than specific units, since it should be possible to convert between units within a quantity kind (e.g. hours to minutes), but not between units of different quantity times (e.g. square metres to hours). I would name this field accordingly (quantityKind) and base it on a subset of the QUDT quantity kinds vocabulary, which already has codes for Area, Count, Density, Time and Currency.

Can you share a link to a dataset in which the exposure metric is expressed as a quantity of density? I'm assuming you don't mean the QUDT definition of density, which is mass per unit volume so it would be good to work out what the correct quantity kind is.

Stuart Fraser · Answer 6 · Tue Aug 22 2023 20:56:50 GMT+0800 (China Standard Time)

At the moment, the "cost" of exposure is limited to monetary currencies. But the value represented by an exposure dataset might be intangible, or just a proxy to later calculate the economic value

Agree - number of buildings / number of people / km of roads (e.g. per grid cell) are commonly used as well as total value (replacement cost / insured value) per grid cell or per building.

in my experience it is actually pretty uncommon to use an exposure dataset that already comes into economic terms.

It is common in national level datasets and some global datasets, but maybe not in the ones used in examples so far. See Central Asia datasets, Africa R5, GEM's global exposure model, as just a few examples. It is also the case as stated that the value might be area or length or count.

To me it is one of the most key information to provide for exposure

Agree -- cost type or (monetary/non-monetary) value of the exposure needs to be readily visible in metadata.

Can you share a link to a dataset in which the exposure metric is expressed as a quantity of density? I'm assuming you don't mean the QUDT definition of density, which is mass per unit volume

This refers more to population density - relating number in a given geographic area. 'Count' would cover this - number of building / population, which would be given in the data as a count per raster grid cell. I haven't yet seen an exposure dataset with the value given as 'no. buildings per km2'.

I think the suggestion from @matamadio works to make it clearer that we can include monetary and non-monetary values and the latter should include Area and Count. I don't think we need Time/Duration here as a metric. In my experience exposure isn't ever given a time value. We might estimate the disruption time as a loss, or (for insurance datasets only) identify an insured value for business interruption for a building, but we wouldn't record a unit of time in the exposure dataset - I can't think of an example where a road or building would be attributed a time value - it wouldn't mean anything practically.

I would request that the data structure allows one or more of count, area AND cost to be included in the same dataset - I can point to examples where the cost is derived from one or both of area and count, and all pieces of data are included in the final data.

Mattia Amadio · Answer 7 · Tue Aug 22 2023 23:48:16 GMT+0800 (China Standard Time)

Does PPP stand for purchasing power parities in this context? If so, PPPs seem more like conversion rates than units. Can you share a link to a dataset in which the exposure metric is expressed in purchasing power parities?

Yes, sometimes costs are expressed as PPP of local currency into USD. Anyway, not strictly necessary.

I would request that the data structure allows one or more of count, area AND cost to be included in the same dataset - I can point to examples where the cost is derived from one or both of area and count, and all pieces of data are included in the final data.

Agree on this solution.

Jen Harris · Answer 8 · Tue Aug 22 2023 23:50:05 GMT+0800 (China Standard Time)

Great so combining all of this we could remove exposure.cost and replace it with exposure.metrics which would be an object holding 2 arrays, one of monetary metrics and one of non-monetary metrics. This would allow for multiple metrics to be included for a single dataset. We could keep using Cost as the monetary items (which is good as we still use Cost in Loss as well) and add an additional $defs/Metric for the non-monetary metric items.

{
  "metrics": {
    "title": "Asset metrics",
    "type": "object",
    "description": "The metrics associated with specific elements of assets detailed in the dataset.",
    "properties": {
      "monetary": {
        "title": "Monetary asset metrics",
        "type": "array",
        "description": "The monetary exposure metrics associated with specific elements of assets detailed in the dataset.",
        "items": {
          "$ref": "#/$defs/Cost"
        },
        "minItems": 1,
        "uniqueItems": true
      },
      "non_monetary": {
        "title": "Non-monetary asset metrics",
        "type": "array",
        "description": "The non-monetary exposure metrics associated with specific elements of assets detailed in the dataset.",
        "items": {
          "$ref": "#/$defs/Metric"
        },
        "minItems": 1,
        "uniqueItems": true
      }
    },
    "minProperties": 1
  }
}

{
  "$defs":{
    "Metric": {
      "title": "Asset metric",
      "type": "object",
      "description": "The metric associated with specific elements of assets detailed in the dataset.",
      "required": [
        "id",
        "type",
        "quantity_kind"
      ],
      "properties": {
        "id": {
          "title": "Identifier",
          "type": "string",
          "description": "A locally unique identifier for this metric.",
          "minLength": 1
        },
        "type": {
          "title": "Metric type",
          "description": "The type of the asset, from the closed [cost type codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#cost_type).",
          "type": "string",
          "codelist": "cost_type.csv",
          "openCodelist": false,
          "enum": [
            "structure",
            "content",
            "product",
            "disruption"
          ]
        },
        "quantity_kind": {
          "title": "Quantity kind",
          "type": "string",
          "description": "The kind of quantity in which the metric is specified, from the open [quantity kind codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#quantity_kind.",
          "codelist": "quantity_kind.csv",
          "openCodelist": true
        }
      },
      "minProperties": 1
    }
  }
}

with the quantity kind codes taken as the most relevant selection in the QUDT quantity kinds vocabulary

Code	Title
area	Area
count	Count
length	Length

One issue with this is that as it stands there isn't a way of expressing that a metric is relating to a population. Does it need to be added to the cost_type codelist? And would it make sense to rename this codelist to metric_type?

Mattia Amadio · Answer 9 · Wed Aug 23 2023 16:55:57 GMT+0800 (China Standard Time)

Thanks Jen. Agree on the solution, including renaming as metric_type and including population.

Duncan Dewhurst · Answer 10 · Thu Aug 24 2023 05:34:07 GMT+0800 (China Standard Time)

It seems to me that there are more similarities than differences between monetary and non-monetary metrics so I would lean towards having a single metrics array.

What are the advantages of separating monetary and non-monetary metrics in the data model instead of having a single metrics array and using the quantity_kind field (with 'Currency' as an option) as a discriminator?

From a general usability point of view, there are some advantages to having a single metrics array: fewer sheets in the spreadsheet representation and I would've thought it would be easier for users to see all of the metrics in a dataset in a single list/table/sheet than to have them split into separate lists.

However, happy to hear if there is a risk-specific reason for separating them!

Stuart Fraser · Answer 11 · Thu Aug 24 2023 15:22:18 GMT+0800 (China Standard Time)

This does seem easier to use and communicate range of metrics and I don't think there is a need to have them in two lists/array

Mattia Amadio · Answer 12 · Thu Aug 24 2023 16:34:31 GMT+0800 (China Standard Time)

Ok for single array grouping based on quantity_kind