capeprivacy / cape-python

Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tokenizer for email addresses

egbagirl opened this issue · comments

Is your feature request related to a problem? Please describe.
Working with data collected from online form fills, need to mask email addresses as they are collected, but once i do this using the tokenizer built for names, i can no longer parse nor analyze the domain.

Describe the solution you'd like
The tokenizer should applied to both parts of an email address, while maintaining the '@ 'sign. Alternatively, allowing the user to determine what part of the email address requires masking (either the username or the domain address).
This way analysis can still be performed on either parts without compromising the identify of the user.