sjoerdk / pimsclient

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiple columns

brambozz opened this issue · comments

Hey Sjoerd,

Really nice client, I have been trying to get it to work with our downloading scripts.

Description

Currently, the automatic download for e.g. COVID CT has per patient ID both a pseudonym and another random number, the anonymization salt. This is used to perturb data randomly, e.g. moving dates around etc.

I would therefore like to be able to not only have "PatientID" | "Pseudonym" in PIMS, but actually rows like "PatientID" | "Pseudonym" | "Salt".

What I Did

I tried hacking around the KeyFiles.deidentify method and also tried fiddling around on the swagger UI of PIMS. I think /api/Keyfiles/{KeyfileKey}/Pseudonyms/{PseudonymId}/Data should be suitable, but I get an insufficient rights error.

Do you know if this is already possible with pimsclient now and I am missing something?

I think I understand what you want. This is not possible in the code as is. But can be added.
It is not easy to add a 'salt' element to a PatiententID. I think the easiest alternative is to add a 'salt' value type here
https://github.com/sjoerdk/pimsclient/blob/master/pimsclient/client.py#L290
And then have a SaltIdentifier(TypedIdentifier) like here https://github.com/sjoerdk/pimsclient/blob/master/pimsclient/client.py#L386 and a SaltPseudonym
You can then have the salt as a separate entry and use for example the value of the pseudo patient ID as the key for this so you can look it up.

If you make a PR I can look at it Wednesday

Submitted a PR #93. I noticed that now per patient I add two entries: PatientID and Salt. Using sequential patient IDs, each new patient skips a number. So e.g. first patient I add has pseudonym 000001, then the next has 000003. I suppose this is something internal to PIMS?

Yes the skipping of numbers is a bug/feature of the underlying PIMS implementation made by IM.
If you work with larger batches the skipping gets larger.
This has been identified as a problem, but is currently hard/ costly to fix for them. I think this should be fixed, but there should be sufficient pressure on IM to do so. Currently I will leave it as is.

@brambozz could you close this issue when working with salt works for you?

OK, it's not a big problem, but just looks a bit weird :)
It's already working for me, so I will close. Could you push a new version to PyPI?

Yes I think it looks very weird and should be changed. Good thing is that in the worst case, when PIMS does not get updated at all, the pimsclient code can be configured to use a different backend..

I will bump the software version. That will make CI push to pypi