ghammad / pyActigraphy

Python-based open source package for actigraphy data analysis

Home Page:https://ghammad.github.io/pyActigraphy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug in inactivity mask filter (Crespo algorithm)

juliusandretti opened this issue · comments

Greetings!

I've been running a few tests with your Crespo algorithm implementation and I've come across a bug.
filter_bug
Notice that there are invalid transitions (duration is less than zeta) both at the beginning and the end of the rest period. These are a result of your program not being able to recognize that there is a time gap between those transitions, it actually processes it as if it was one single wake period. Another point you should review is that, according to the article by Crespo at al, sequences of zeros with length greater than zeta are considered invalid. In the code sequences with length equal to zeta are also considered invalid. I can provide you with the data I've been probing so you can confirm my findings. Hope this will help, you've been doing a great job, keep it up!

Hi @juliusandretti,

thanks for the message. Could you provide me with a minimal example? Let's work out a test so that we can be confident about the implementation of Crespo's algorithm.

Thank you for your help.

Greg

Here's a minimal example. Hope it helps.

`from datetime import datetime, timedelta

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

from pyActigraphy.filters import _create_inactivity_mask

Time index for the arrays

start = datetime.today().replace(second=0)
dt = timedelta(minutes=1)
time = [start + i*dt for i in range(30)]

Estimate of wake/rest

ye = np.ones(30)
ye[10:20] = 0 # 1/3 wake, 1/3 rest, 1/3 wake
ye = pd.Series(ye, index=time)

Activity counts

act = np.ones(30)
act[[8,9,12,13,14,15,16,20,21]] = 0
act = pd.Series(act, index=time)

Applying the filter

mask_rest = _create_inactivity_mask(act[ye < 1], 11, 1)
mask_actv = _create_inactivity_mask(act[ye > 0], 4, 1)
mask_ar = pd.concat([mask_actv, mask_rest], verify_integrity=True)

There are no invalid zeroes

expected = pd.Series(np.ones(30), index=time)

Plots

plt.plot(1.2ye,label='1.2ye')
plt.plot(1.1expected,label='1.1expected mask')
plt.plot(mask_ar.sort_index(),label='mask')
plt.legend()
plt.show()`

commented

I know this is an old issue, but I believe that this works as expected.
The code provided as example splits the active and rest phases to apply different minimal durations for the creation of the inactivity mask.
While this sounds like a resonable desire, the approach is flawed.

To prove that, one has to inspect the series returned by this selection call:

# full line of code: mask_actv = _create_inactivity_mask(act[ye > 0], 4, 1)
act[ye > 0]

Based on the generated input, this yields the following result:

2023-08-03 10:05:00.212904    1.0
2023-08-03 10:06:00.212904    1.0
2023-08-03 10:07:00.212904    1.0
2023-08-03 10:08:00.212904    1.0
2023-08-03 10:09:00.212904    1.0
2023-08-03 10:10:00.212904    1.0
2023-08-03 10:11:00.212904    1.0
2023-08-03 10:12:00.212904    1.0
2023-08-03 10:13:00.212904    0.0
2023-08-03 10:14:00.212904    0.0 # here we have a time gap afterwards because of activity and rest phases!!!!!
2023-08-03 10:25:00.212904    0.0
2023-08-03 10:26:00.212904    0.0
2023-08-03 10:27:00.212904    1.0
2023-08-03 10:28:00.212904    1.0
2023-08-03 10:29:00.212904    1.0
2023-08-03 10:30:00.212904    1.0
2023-08-03 10:31:00.212904    1.0
2023-08-03 10:32:00.212904    1.0
2023-08-03 10:33:00.212904    1.0
2023-08-03 10:34:00.212904    1.0

Now, when passing this to create_inactivity_mask(), respectively _create_inactivity_mask() we run into a problem.
The index is not checked!
Have a look at the following line from _create_inactivity_mask() https://github.com/ghammad/pyActigraphy/blob/master/pyActigraphy/filters/filters.py#L13:

mask = np.ones_like(data)

This shows that the function is ignorant of what kind of time series is passed in, and gaps are consequently not handled.

So, we get the following output:

>>> _create_inactivity_mask(act[ye > 0], 4, 1)
2023-08-03 10:05:00.212904    1.0
2023-08-03 10:06:00.212904    1.0
2023-08-03 10:07:00.212904    1.0
2023-08-03 10:08:00.212904    1.0
2023-08-03 10:09:00.212904    1.0
2023-08-03 10:10:00.212904    1.0
2023-08-03 10:11:00.212904    1.0
2023-08-03 10:12:00.212904    1.0
2023-08-03 10:13:00.212904    0.0
2023-08-03 10:14:00.212904    0.0
2023-08-03 10:25:00.212904    0.0
2023-08-03 10:26:00.212904    0.0
2023-08-03 10:27:00.212904    1.0
2023-08-03 10:28:00.212904    1.0
2023-08-03 10:29:00.212904    1.0
2023-08-03 10:30:00.212904    1.0
2023-08-03 10:31:00.212904    1.0
2023-08-03 10:32:00.212904    1.0
2023-08-03 10:33:00.212904    1.0
2023-08-03 10:34:00.212904    1.0

After all, I still believe the function works as intended.
However, to prevent confusion in the future, one could implement a simple check in the function to verify that the data is continous.
I'm not familiar with the whole code base and if there exists a general support for non-continuing time series in PyActigraphy.
Thus, I gladly leave the decision for the next actions gladly to somebody else. I just hope, the analysis is sparing others some time.

Hello @ugGit

Indeed, the mask creation function does not accept (and more generally, there is no support in pyActigraphy for) non-continuous time series. That's why masked periods are substituted with NaNs instead of simply chopping the chunk off. I considered that being easier for users and easier to handle from the code point of view.

That being said, adding a check in the mask creation function would be good. Would you be up for a PR?

Thank you.

Greg

commented

To be honest. I'm not convinced anymore if it is reasonable catching this during the mask creation.

I rather suggest to make the check for continuous data at the highest level possible, which is the instantiation of the BaseRaw object.
This would also prevent any other functions from producing outputs based on non-continuous input data.
What do you think?

As a check, I suggest during the init of BaseRaw something like:

if not data.equals(data.resample(frequency)):
  raise ValueError('smart message')

And yes, I'll make a PR once you agree on the implementation :)

@ugGit Well, I haven't tested the check you propose but it could be implemented, provided it is efficient (in terms of CPU timing). Nonetheless, the original proposal still looks good to me because it is a requirement of the function itself and this function is used in several parts of the package and might be re-used again in future developments.

To summarize, both modifications do not have to be exclusive.

WDYT?

commented

Good point regarding the exclusivity!

I will already prepare a PR for the check in _create_inactivity_mask().

Regarding the second option, where the continuity is checked during the initialization of the BaseRaw object. Here it should only be implemented if all analyses are supposed to work on continuous data.
If you think, that this is the case, then I would implement the check there as well.

Regarding the implementation detail, this should be more performant:

expected_nbr_of_epochs = (data.index[-1] - data.index[0]) / freq 
present_nbr_of_epochs = len(data)
if present_nbr_of_epochs > 1 AND present_nbr_of_epochs != expected_nbr_of_epochs:
  raise ValueError('smart message')