leo-mazz / crowds

A collection of anonymization algorithms in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Creating multiple instances with different generalization rules fails with a "KeyError" exception

iflow opened this issue · comments

Steps to reproduce

  1. Anonymize data
generalization_rules = {
    'sex': GenRule([]) # 1 level
}
adult_anonymised = anonymize(adult, generalization_rules=generalization_rules, k=2, max_sup=0.0, info_loss=entropy_loss)
  1. Anonymize data with different rule set
generalization_rules = {
    'race': GenRule([])
}
adult_anonymised = anonymize(adult, generalization_rules=generalization_rules, k=2, max_sup=0.0, info_loss=entropy_loss)

Result:
KeyError: 'sex'

Expected:
Anonymized data with new rule set.

Solution

Search for the function _k_min and replace the first lines with this:

def _k_min(b_node, t_node, k, max_sup, k_min_set=None):
    """ Core of OLA's operation: build k-minimal set with binary search in generalization
    strategies of lattice """

    if k_min_set is None:
        k_min_set = set()

Hi @iflow, thank you for your report. Would you be open to contributing a PR?

Yes, I would have done it yesterday already :)
However I could not reproduce the issue with your example.

I will try it today again and commit the solution.

With the previous commits, the issue seems to be fixed. I extended your example to test the behaviour.