The issue about heart beat interval preprocessing for interpolation

Question

The issue about heart beat interval preprocessing for interpolation

minsooyeo opened this issue 3 years ago · comments

I am on studying heart rate variability with your great packages, I sincerely appreciate your efforts for this package

I have founded a problem on pre-processing in heart rate variability below toy codes

from hrvanalysis import preprocessing as pre

hrv_sample_1 = [700, 800, 10000, 10000, 650, 700, 750, 540]
hrv_sample_2 = [10000, 10000, 800, 700, 800, 900, 10000, 10000, 650, 700]
hrv_sample_3 = [10000, 10000, 800, 700, 800, 900, 10000, 10000, 650, 700, 10000, 10000]

hrv_sample_1 = pre.interpolate_nan_values(pre.remove_outliers(hrv_sample_1))
hrv_sample_2 = pre.interpolate_nan_values(pre.remove_outliers(hrv_sample_2))
hrv_sample_3 = pre.interpolate_nan_values(pre.remove_outliers(hrv_sample_3))

print("hrv sample 1: {}".format(hrv_sample_1))
print("hrv sample 2: {}".format(hrv_sample_2))
print("hrv sample 3: {}".format(hrv_sample_3))

>>> hrv sample 1: [700.0, 800.0, 750.0, 700.0, 650.0, 700.0, 750.0, 540.0]
>>> hrv sample 2: [nan, nan, 800.0, 700.0, 800.0, 900.0, 816.6666666666666, 733.3333333333334, 650.0, 700.0]
>>> hrv sample 3: [nan, nan, 800.0, 700.0, 800.0, 900.0, 816.6666666666666, 733.3333333333334, 650.0, 700.0, 700.0, 700.0]

you can see this code, 10000 value is abnormal heart beat interval.
a variable "hrv_sample_1" was prepossessed normally, however, hrv_sample_2 and hrv_sample_3 still include nan value if first data is abnormal value

to solve it, i proposes a simple method.

you can see code, variable hrv_sample_3 has abnormal value on end point, it was replaced to previous value.
Similar to this method, i add code in interpolate_nan_values function below

from typing import Tuple
from typing import List
import pandas as pd
import numpy as np

# Static name for methods params
MALIK_RULE = "malik"
KARLSSON_RULE = "karlsson"
KAMATH_RULE = "kamath"
ACAR_RULE = "acar"
CUSTOM_RULE = "custom"


def interpolate_nan_values(rr_intervals: list,
                           interpolation_method: str = "linear",
                           limit_area: str = None,
                           limit_direction: str = "forward",
                           limit=None, ) -> list:
    """
    Function that interpolate Nan values with linear interpolation
    Parameters
    ---------
    rr_intervals : list
        RrIntervals list.
    interpolation_method : str
        Method used to interpolate Nan values of series.
    limit_area: str
        If limit is specified, consecutive NaNs will be filled with this restriction.
    limit_direction: str
        If limit is specified, consecutive NaNs will be filled in this direction.
    limit: int
        TODO
    Returns
    ---------
    interpolated_rr_intervals : list
        new list with outliers replaced by interpolated values.
    """
    # search first nan data and fill it post value until it is not nan
    if np.isnan(rr_intervals[0]):
        start_idx = 0

        while np.isnan(rr_intervals[start_idx]):
            start_idx += 1

        rr_intervals[0:start_idx] = [rr_intervals[start_idx]] * start_idx
    else:
        pass
    # change rr_intervals to pd series
    series_rr_intervals_cleaned = pd.Series(rr_intervals)
    # Interpolate nan values and convert pandas object to list of values
    interpolated_rr_intervals = series_rr_intervals_cleaned.interpolate(method=interpolation_method,
                                                                        limit=limit,
                                                                        limit_area=limit_area,
                                                                        limit_direction=limit_direction)
    return interpolated_rr_intervals.values.tolist()



from hrvanalysis import preprocessing as pre

hrv_sample_1 = [700, 800, 10000, 10000, 650, 700, 750, 540]
hrv_sample_2 = [10000, 10000, 800, 700, 800, 900, 10000, 10000, 650, 700]
hrv_sample_3 = [10000, 10000, 800, 700, 800, 900, 10000, 10000, 650, 700, 10000, 10000]

hrv_sample_1 = interpolate_nan_values(pre.remove_outliers(hrv_sample_1))
hrv_sample_2 = interpolate_nan_values(pre.remove_outliers(hrv_sample_2))
hrv_sample_3 = interpolate_nan_values(pre.remove_outliers(hrv_sample_3))

print("hrv sample 1: {}".format(hrv_sample_1))
print("hrv sample 2: {}".format(hrv_sample_2))
print("hrv sample 3: {}".format(hrv_sample_3))


out:
>>> hrv sample 1: [700.0, 800.0, 750.0, 700.0, 650.0, 700.0, 750.0, 540.0]
>>> hrv sample 2: [800.0, 800.0, 800.0, 700.0, 800.0, 900.0, 816.6666666666666, 733.3333333333334, 650.0, 700.0]
>>> hrv sample 3: [800.0, 800.0, 800.0, 700.0, 800.0, 900.0, 816.6666666666666, 733.3333333333334, 650.0, 700.0, 700.0, 700.0]

it is change abnormal 2 start point to post value correctly,
I made pull-request

thank you.