Calculation of pNN50 wrong?

Question

Calculation of pNN50 wrong?

skjerns opened this issue 4 years ago · comments

I'm calculating the pNN50, but it seems like there is a mistake:

import numpy as np
import hrvanalysis

# create fake NNs, all with >60ms diff
NNs = np.arange(1100, 2000, 60)

pNN50 = hrvanalysis.get_time_domain_features(NNs)['pnni_50']
# should be 100%, but is 93%

I also found the reason:

https://github.com/Aura-healthcare/hrvanalysis/blame/2aca66ee65e2bf4867a6badc17322197c196d70d/hrvanalysis/extract_features.py#L109

You take a np.diff(RR) and then divide the sum of >50 by len(RR). However, np.diff(RR) will have one element less than RR.

Edit: I wrote some things in a non-friendly way, please accept my apologies for doing so, I edited them out.

Robin Champseix · Answer 1 · Wed Jun 03 2020 21:24:35 GMT+0800 (China Standard Time)

Hi,

First, I would like to thank you for reporting this problem and using the library.

My understanding from the main research paper used to develop the features might indeed not be correct. But let's have a look at an extract:

It says clearly that the pNN50 is equal to "NN50 count divided by the total number of all NN intervals.".

I totally understand your frustration as this is very basic Maths but I tried to strictly follow research papers. I will challenge this with some coworkers who helped me develop the module.

If you like the package though, free to contribute and create pull request :-)
Regards,
Robin

Robin Champseix · Answer 2 · Wed Jun 10 2020 15:26:48 GMT+0800 (China Standard Time)

Hi again @skjerns ,

After checking with some colleagues, we decided to keep the current implementation.
Two reasons for that:

It is not written anywhere that pnn50 could be equal to 1. In the paper, we visually can see that it is never equal to 1 (or 100% as in the screenshot below).

We found another quote saying that we should divide by the number of all NN intervals.

Until proven otherwise, the pnn50 will stay as is.

I remind you that this is an Open Source project and we prone kindess and amability. Next time, I would watch your tone (cf: "slightly angry at this really simple mistake") and suggest you dig into some research papers first.

Regards,
Robin

Simon Kern · Answer 3 · Tue Jun 16 2020 17:31:45 GMT+0800 (China Standard Time)

First of all let me apologise for my improper use of language.

Secondly I want to give arguments for adapting the formula to range from 0 to 1.

Some more recent research papers might use the formula different, however, they might be mistaken in translating the description of the characteristic to a formula. In the original research paper that introduced the pNN50 (Bigger et al 1988) the parameter is described as

We computed the absolute value of each individual difference between adjacent N-N intervals and summarized the differences by the percentage of differences exceeding 50ms"

In my understanding this translates to using n-1 in the denominator, as the "percentage of differences" is taken.

On another note the current implementation will introduce a computational bias.
Assume two analysis, one with 15 second windows, one with 300 second windows and each heart beat off by 60 ms (as in my initial example). This would give a pNN50 of 93.3% for the 30 seconds and 99.7% for 300 second window, although the data stems from the same underlying generator. pNN50 was exactly introduced to mitigate the bias of absolute values, and to introduce a relative measure.

It also makes more intuitive sense, as a percentage value should naturally range from 0 to 100% and not have an upper limit that is given by the number of intervals inside the analysis window.

Additionally, many other HRV analysis tools also use the formula with n-1 (eg pyHRV, Kubios, ...)

e.g.

import pyhrv
pyhrv.time_domain.nn50(NNs)
# pNN50 = 100

Laurent Ribiere · Answer 4 · Fri Jun 19 2020 02:12:47 GMT+0800 (China Standard Time)

Hi @skjerns, hi Robin,

I agree it's not perfectly clear whether the pNN50 denominator should be equal to the number of NN intervals (let's say n) or the number of pairs of adjacant intervals (n-1).

How about allowing both computations and let the user decide ? I'm thinking of a simple boolean parameter for the get_time_domain_features function that could give something like this :

def get_time_domain_features(nn_intervals: List[float], pnni_as_percent: bool = True) -> dict:
# ...
length_int = len(nn_intervals) - 1 if pnni_as_percent else len(nn_intervals)
# ...
pnni_50 = 100 * nni_50 / length_int
# ...
pnni_20 = 100 * nni_50 / length_int
# ...