.IS and _interdaily_stability possibly giving different results with resampling

Question

.IS and _interdaily_stability possibly giving different results with resampling

hughesan opened this issue 10 months ago · comments

alexandria hughes commented 10 months ago

Hi - I have a question about resampling frequency in interdaily stability. The TL;DR version is that I am getting different results using freq="1H" versus using the _interdaily_stability function in the source code but only grouping by hour.

The long version of the story is that I have patient data that has a lot of spots we would like to mask - so many that I found it daunting to create a mask file for each. I'm much more comfortable with R than Python so I decided to use R to read in each actigraphy file and mask time according to our determined criteria (remove an entire day if >= 6 h of epochs are NaN). A few patients had no days that needed to be removed, so I decided to see if I could compute IS from scratch in R on one of these patients and match the results I get in pyActigraphy with .IS(binarize=False). Note, these files still have some epochs that are NaN, but my understanding is that missing data is omitted from calculation in .IS and I use var(., na.rm=T) in R to ensure the same.

In R, when actigraphy data are grouped by hour and minute (minute epochs), and the variance of the time group-means is divided by the overall sample variance, my value for IS matches exactly what I find reading the original file in to pyActigraphy and running .IS(binarize=False,freq=”1min”).

However, varying the resampling frequency gave me some unexpected results. In pyActigraphy I used .IS(binarize=False,freq=”1H”) to get what I believe is hour-grouped IS. To compute in R, the data were grouped by hour and variance of the hourly means was divided by the overall sample variance. These unexpectedly give quite different values; the values were roughly 0.34 (pyAct IS) vs 0.11 (R); for reference, the hour/minute IS value that matched in both places was also ~0.11. However, if I use the _interdaily_stability function in the source (https://github.com/ghammad/pyActigraphy/blob/master/pyActigraphy/metrics/metrics.py, line 56) with minute/second grouping omitted, I get the same result as with hour-grouping in R (0.11).

def _interdaily_stability(data): d_24h = data.groupby([ data.index.hour,] #data.index.minute, #data.index.second] ).mean().var() d_1h = data.var() return (d_24h / d_1h)
I read the pandas documentation for resampling (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html) and it seems like computing IS from scratch with data grouped by the chosen resample frequency should match setting a resampling frequency in .IS(), but this is not what I am finding.