The issue about heart beat interval preprocessing for interpolation
minsooyeo opened this issue · comments
I am on studying heart rate variability with your great packages, I sincerely appreciate your efforts for this package
I have founded a problem on pre-processing in heart rate variability below toy codes
from hrvanalysis import preprocessing as pre
hrv_sample_1 = [700, 800, 10000, 10000, 650, 700, 750, 540]
hrv_sample_2 = [10000, 10000, 800, 700, 800, 900, 10000, 10000, 650, 700]
hrv_sample_3 = [10000, 10000, 800, 700, 800, 900, 10000, 10000, 650, 700, 10000, 10000]
hrv_sample_1 = pre.interpolate_nan_values(pre.remove_outliers(hrv_sample_1))
hrv_sample_2 = pre.interpolate_nan_values(pre.remove_outliers(hrv_sample_2))
hrv_sample_3 = pre.interpolate_nan_values(pre.remove_outliers(hrv_sample_3))
print("hrv sample 1: {}".format(hrv_sample_1))
print("hrv sample 2: {}".format(hrv_sample_2))
print("hrv sample 3: {}".format(hrv_sample_3))
>>> hrv sample 1: [700.0, 800.0, 750.0, 700.0, 650.0, 700.0, 750.0, 540.0]
>>> hrv sample 2: [nan, nan, 800.0, 700.0, 800.0, 900.0, 816.6666666666666, 733.3333333333334, 650.0, 700.0]
>>> hrv sample 3: [nan, nan, 800.0, 700.0, 800.0, 900.0, 816.6666666666666, 733.3333333333334, 650.0, 700.0, 700.0, 700.0]
you can see this code, 10000 value is abnormal heart beat interval.
a variable "hrv_sample_1" was prepossessed normally, however, hrv_sample_2 and hrv_sample_3 still include nan value if first data is abnormal value
to solve it, i proposes a simple method.
you can see code, variable hrv_sample_3 has abnormal value on end point, it was replaced to previous value.
Similar to this method, i add code in interpolate_nan_values function below
from typing import Tuple
from typing import List
import pandas as pd
import numpy as np
# Static name for methods params
MALIK_RULE = "malik"
KARLSSON_RULE = "karlsson"
KAMATH_RULE = "kamath"
ACAR_RULE = "acar"
CUSTOM_RULE = "custom"
def interpolate_nan_values(rr_intervals: list,
interpolation_method: str = "linear",
limit_area: str = None,
limit_direction: str = "forward",
limit=None, ) -> list:
"""
Function that interpolate Nan values with linear interpolation
Parameters
---------
rr_intervals : list
RrIntervals list.
interpolation_method : str
Method used to interpolate Nan values of series.
limit_area: str
If limit is specified, consecutive NaNs will be filled with this restriction.
limit_direction: str
If limit is specified, consecutive NaNs will be filled in this direction.
limit: int
TODO
Returns
---------
interpolated_rr_intervals : list
new list with outliers replaced by interpolated values.
"""
# search first nan data and fill it post value until it is not nan
if np.isnan(rr_intervals[0]):
start_idx = 0
while np.isnan(rr_intervals[start_idx]):
start_idx += 1
rr_intervals[0:start_idx] = [rr_intervals[start_idx]] * start_idx
else:
pass
# change rr_intervals to pd series
series_rr_intervals_cleaned = pd.Series(rr_intervals)
# Interpolate nan values and convert pandas object to list of values
interpolated_rr_intervals = series_rr_intervals_cleaned.interpolate(method=interpolation_method,
limit=limit,
limit_area=limit_area,
limit_direction=limit_direction)
return interpolated_rr_intervals.values.tolist()
from hrvanalysis import preprocessing as pre
hrv_sample_1 = [700, 800, 10000, 10000, 650, 700, 750, 540]
hrv_sample_2 = [10000, 10000, 800, 700, 800, 900, 10000, 10000, 650, 700]
hrv_sample_3 = [10000, 10000, 800, 700, 800, 900, 10000, 10000, 650, 700, 10000, 10000]
hrv_sample_1 = interpolate_nan_values(pre.remove_outliers(hrv_sample_1))
hrv_sample_2 = interpolate_nan_values(pre.remove_outliers(hrv_sample_2))
hrv_sample_3 = interpolate_nan_values(pre.remove_outliers(hrv_sample_3))
print("hrv sample 1: {}".format(hrv_sample_1))
print("hrv sample 2: {}".format(hrv_sample_2))
print("hrv sample 3: {}".format(hrv_sample_3))
out:
>>> hrv sample 1: [700.0, 800.0, 750.0, 700.0, 650.0, 700.0, 750.0, 540.0]
>>> hrv sample 2: [800.0, 800.0, 800.0, 700.0, 800.0, 900.0, 816.6666666666666, 733.3333333333334, 650.0, 700.0]
>>> hrv sample 3: [800.0, 800.0, 800.0, 700.0, 800.0, 900.0, 816.6666666666666, 733.3333333333334, 650.0, 700.0, 700.0, 700.0]
it is change abnormal 2 start point to post value correctly,
I made pull-request
thank you.