capeprivacy / cape-python

Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark

Home Page:https://docs.capeprivacy.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`DatePerturbation` raises when the index doesn't contain the key 0

TomAugspurger opened this issue · comments

Describe the bug

When a pandas Series is passed that has a non-default integer index (really, just doesn't contain the key 0) a

To Reproduce

In [2]: import pandas as pd
   ...: from cape_privacy.pandas import transformations as tfms
   ...:
   ...: perturb_application_date = tfms.DatePerturbation(frequency="DAY", min=-3, max=3)
   ...: s = pd.Series(pd.date_range('2000', periods=12), index=list(range(1, 13)))
   ...: s
Out[2]:
1    2000-01-01
2    2000-01-02
3    2000-01-03
4    2000-01-04
5    2000-01-05
6    2000-01-06
7    2000-01-07
8    2000-01-08
9    2000-01-09
10   2000-01-10
11   2000-01-11
12   2000-01-12
dtype: datetime64[ns]

In [3]: perturb_application_date(s)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/Envs/dask-dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-3-6365109791c4> in <module>
----> 1 perturb_application_date(s)

~/Envs/dask-dev/lib/python3.8/site-packages/cape_privacy/pandas/transformations/perturbation.py in __call__(self, x)
    111
    112         # Use equality instead of isinstance because of inheritance
--> 113         if type(x[0]) == datetime.date:
    114             x = pd.to_datetime(x)
    115             is_date_no_time = True

~/Envs/dask-dev/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
    880
    881         elif key_is_scalar:
--> 882             return self._get_value(key)
    883
    884         if (

~/Envs/dask-dev/lib/python3.8/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
    989
    990         # Similar to Index.get_value, but we do not fall back to positional
--> 991         loc = self.index.get_loc(label)
    992         return self.index._get_values_for_loc(self, loc, label)
    993

~/Envs/dask-dev/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892
   2893         if tolerance is not None:

KeyError: 0

Expected behavior

The same behavior as if the index did have a key 0.

Desktop (please complete the following information):

  • OS: macOS

  • OS Version: 10.14.5

  • Python Version: 3.8

  • Installed pip packages

  • cape-privacy 0.2.0

  • pandas 1.1.0

Additional context

I think cape wants something like x.iloc[0], but even that will fail if the Series is empty and has no rows. Perhaps something like if pd.api.types.is_object_dtype(x.dtype) then try a pd.to_datetime(x)? I haven't looked closely at the code.

Hey @TomAugspurger. Thanks for the interest and issue! I've tested this on master and it looks like it has already been fixed. If you want to test it off master that would be great. We should be doing a release with latest updates pretty soon. Thanks!

Great thanks. I'll trust that it's fixed :)