sjoerdk / idis

Image DeIdentification Service, a wrapper around RSNA CTP for anonymizion of medical images

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sending a study again within days will raise exception

sjoerdk opened this issue · comments

Describe the bug
See title

To Reproduce

  • send a study to a stream. It goes through the pipeline and ends up in the finished folder.
  • send the same study again
    This will cause
raise StudyPushException(e)
idissend.core.StudyPushException: Destination path '/root/stages/stage/stream/studyfolder' already exists

Expected behavior
Not sure. Possibly merging the two folders? Definitly not a hard crash.

This bug also arises when sending a finished study again.

The issue with this bug is that a fundamental design assumption turns out not to hold. Namely that a study will only be sent once to the pipeline. It turns out that data with the same studyinstanceUID will sometimes be sent on several times within the same day, or the following day.
For pipeline processing, it does not even matter that much whether this data is a re-send of something, or that it is different bits of the same study.

This shines a light at a number of potential issues.

  • When data is being sent to the pipeline, this data belongs to a single study -> In no way is this assured. It is just often the case
  • From a user perspective, they just send 'a bunch of files' to the pipeline. And expect these to be output in a sort-of sorted way.
  • If each file could be anonymized individually, there would be no issue. The issue only exists because the anonymization servers sort files into 'jobs' that are traditionally studies. Having one job per file would flood the server.

The best solution is probably this:

  • Do not use 'study' to refer to the collections of files that are passed around the pipeline. They are not (necessarily) studies. Rename this in code to something like collection? Think of a good name.
  • The name of a 'collection' can be the study name or uid if that makes things easier, it just means that new data for the same study will have to get a different name.
  • Internally, idissend should keep a record of all 'collection' objects to prevent clashes

One question is whether to go full-on database now, or keep the file-based approach.

The only place where the actual DICOM data in the files is linked to the pipeline is at the C-Store node at the start: this saves the files in a studyinstanceUID folder. Perhaps this linking is already basically wrong as what is being sent is not guaranteed to be a study

A very simple solution: When importing a collection into the pipeline, just add a few random characters. This will make the collection unique throughout its lifecycle in the pipeline