chvlyl / Comorbidity_Index

Python package for comorbidity index calculation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Comorbidity Index

Comorbidity means multiple disease diagnosis for a patient. In EHR data, usually each patient at each encounter will have one primary diagnosis as well as several other non-primary diagnoses. The co-exist of such multiple disease diagnoses is called comorbidity. (Comorbidity vs. Complication: comorbidity is a pre-existing condition at admission and it is confunding to the treatment of the primary condition. Complication is a condition arising during the hospital stay, which can be considered as an adverse event. For example, for a certain procedure, the comorbidity may be diabetes and the complication may be infection). From data analysis point of view, the comorbidity is a very useful indicator for clinical outcomes such as mortality. So it is desired to summarise the comorbidity into one score. There are serveral ways to calculate the comorbidity index such as Charlson Comorbidity Index, Elixhauser Comorbidity Index and NCI Comorbidity Index.

This python package is designed to calculate those comorbidity indices.

Original Charlson Comorbidity Index

Charlson et al. proposed this comorbidity index in their 1987 paper, A new method of classifying prognostic comorbidity in longitudinal studies: development and validation, which has been cited more than 20,000 times since then. Based on the data of 607 patients admitted to medical service in one month period, they proposed a comorbidity index including 17 diseases with different weights. They showed that this comorbidity index is good indicator for one-year mortality. The following table shows the 17 diseases and associated weights in Charlson Comorbidity Index (table was adapted from Comorbidity indices by Dougados)

Charlson Comorbidity Index

Note that in the original paper, the disease diagnosis was defined by reviewing medical charts. However, in the issurance claim data or EHR data, the disease diagnosis is defined by diagnosis code. So we need to define the diagnosis code for those diseases in the Charlson comorbidity index in order to calculate it.

Charlson Comorbidity Index by Deyo et al. (1992) and Romano et al. (1993)

In 1992, Deyo et al. provided a way to calculate the Charlson comorbidity index using ICD-9-CM code. The table below shows the mapping between the diseases defined in original Charlson index and ICD-9-CM code. The table was adapted from Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases

Deyo Charlson Comorbidity Index

Deyo Charlson Comorbidity Index

Similarly, Romano et al. (1993) also developed their version of Charlson Comorbidity Index using ICD-9-CM code in their paper Further evidence concerning the use of a clinical comorbidity index with ICD-9-CM administrative data

Charlson Comorbidity Index by Quan et al.

In 2005, Quan et al. further extended the Charlson Comorbidity Index calculation using ICD-9-CM (called enhanced ICD-1-CM) and ICD-10 code (Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data)

The original Charlson Comorbidity Index was developed more than 20 years ago. In 2010, Quan et al. reevaluated the Charlson index based on a dataset of 55,929 patients. They reassigned the weights to each disease in the original Charlson index. Since five the them have updated weights of zeros, the updated Charlson Index only includes 12 diseases. The authors also validated the upated Charlson index on the external dataset from six other hospitals. The new 12-comorbidity index shows improved C-statistics for predicting mortality compared to the original one. The following table shows the 12 diseases and associated weights in updated Charlson Comorbidity Index (table was adapted from Updating and Validating the Charlson Comorbidity Index and Score for Risk Adjustment in Hospital Discharge Abstracts Using Data From 6 Countries)

Updated Charlson Comorbidity Index

Elixhauser Comorbidity Index

Elixhauser et al. extended Charlson Comorbidity Index to include more diseases (total 30 diseases) in their paper Comorbidity measures for use with administrative data. They showed that those comorbidities were associated with clinical outcomes such as length of stay, hospital charges, and mortality. The following table shows the 30 diseases or problems in Elixhauser Comorbidity Index (table was adapted from Comorbidity indices by Dougados)

Charlson Comorbidity Index

ICD code

In order to calculate the comorbidity index, we need to know the diagnosis first. In the original Charlson Comorbidity Index paper, they reviewed the medical charts to define the patient's diagnosis. However, in the EHR data, the diagnosis is usually represented in diagnosis code (for example, ICD-9 or ICD-10 code). ICD stands for International Classification of Diseases, which is commonly used for disease diagnosis classification. ICD-9 is the old version and ICD-10 is the new version. In order to calculate the comorbidity index based on the ICD code, we need to map the disease diagnosis used in the comorbidity index to their corresponding ICD code.

We need to understand different versions of ICD codes before we can use the codes to calculate comorbidity index.

The main difference between ICD-9 and ICD-10 is that the latter has more categories (ICD-9:around 68000 codes; ICD-10:around 13000 codes). Another difference is that ICD-9 has only numeric categories but ICD-10 has alphanumeric categories.

The original ICD-9 and ICD-10 systems are published and maintained by the World Health Organization (WHO). But the original version of ICD codes are not actually used in the United States. Instead, both ICD-9 and ICD-10 have a modified version, called ICD-9-CM and ICD-10-CM (CM stands for Clinical Modification). Those are the official coding systems used in the United States to assign codes to diagnoses and procedures in hospitals. The modified version is maintained by Centers for Disease Control (CDC).

This online video explains the ICD-9, ICD-10 and their differences.

ICD-9-CM diagnosis code has a format like "123.45", in which the first three digits are disease category and the 4th and 5th digit are sucategory and subclass. There are two other types of ICD-9-CM codes, which are E-codes ("E123.45") and V-codes ("V12.34"). E-codes in ICD-9-CM are for external causes of injury. V-codes records the reasons why a healthy person visits a hospital without any immediate injury or disease.

ICD-10-CM diagnosis code has a format like "A12.345B".Similarly, the first three characters are catogory, the 4th one is the subcategory, the 5th and 6th ones are subclass and the final letter is the extension (type of encounters: initial encounters, subsequent encounters or previous conditions)

How to Calculate Comorbidity Index?

There is also an AHRQ version (Agency for Healthcare Research and Quality)

The SAS code can be found on MCHP website:


  1. Charlson ME et al. (1987): A new method of classifying prognostic comorbidity in longitudinal studies: development and validation
  2. Deyo RA et al. (1992): Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases
  3. Quan H et al. (2005): Translated the Charlson and Elixhauser comorbidity indexed into ICD-10
ezoic increase your site revenue


Python package for comorbidity index calculation