Aalen-Johansen fit() input - different cif table when I pass a numpy array vs pd.Series
ygivenx opened this issue · comments
import lifelines
def get_estimated_cif(durations, events, event_of_interest=1):
ajf = lifelines.AalenJohansenFitter(calculate_variance=True)
ajf.fit(durations, events, event_of_interest=event_of_interest)
return ajf.cumulative_density_
get_estimated_cif(df["durations"].values, df["event"].values)
get_estimated_cif(df["durations"].values, df["event"]) # cif table is different from the table above
There are tied event times in my dataset - so _jitter is called.
The Aalen-Johansen estimator can't handle ties, as when events of different types occur, there needs to be a clear ordering for the computation. When AalenJohansenFitter
sees ties, it randomly breaks them. The difference you see between a pd.Series
and np.array
is because the random number generator shifts the observations differently between the two calls.
If you want multiple calls to AalenJohansenFitter
to produce the same result, you should manually break any tied event times in the data set. That way _jitter is not called, so the same event table should be produced each time.