Sending a study again within days will raise exception

Question

Sending a study again within days will raise exception

sjoerdk opened this issue 4 years ago · comments

Describe the bug
See title

To Reproduce

send a study to a stream. It goes through the pipeline and ends up in the finished folder.
send the same study again
This will cause

raise StudyPushException(e)
idissend.core.StudyPushException: Destination path '/root/stages/stage/stream/studyfolder' already exists

Expected behavior
Not sure. Possibly merging the two folders? Definitly not a hard crash.

Sjoerd Kerkstra · Answer 1 · Wed Oct 14 2020 16:32:36 GMT+0800 (China Standard Time)

This bug also arises when sending a finished study again.

Sjoerd Kerkstra · Answer 2 · Mon Nov 09 2020 23:03:37 GMT+0800 (China Standard Time)

The issue with this bug is that a fundamental design assumption turns out not to hold. Namely that a study will only be sent once to the pipeline. It turns out that data with the same studyinstanceUID will sometimes be sent on several times within the same day, or the following day.
For pipeline processing, it does not even matter that much whether this data is a re-send of something, or that it is different bits of the same study.

This shines a light at a number of potential issues.

When data is being sent to the pipeline, this data belongs to a single study -> In no way is this assured. It is just often the case
From a user perspective, they just send 'a bunch of files' to the pipeline. And expect these to be output in a sort-of sorted way.
If each file could be anonymized individually, there would be no issue. The issue only exists because the anonymization servers sort files into 'jobs' that are traditionally studies. Having one job per file would flood the server.

Sjoerd Kerkstra · Answer 3 · Mon Nov 09 2020 23:14:30 GMT+0800 (China Standard Time)

The best solution is probably this:

Do not use 'study' to refer to the collections of files that are passed around the pipeline. They are not (necessarily) studies. Rename this in code to something like collection? Think of a good name.
The name of a 'collection' can be the study name or uid if that makes things easier, it just means that new data for the same study will have to get a different name.
Internally, idissend should keep a record of all 'collection' objects to prevent clashes

One question is whether to go full-on database now, or keep the file-based approach.

Sjoerd Kerkstra · Answer 4 · Mon Nov 09 2020 23:16:30 GMT+0800 (China Standard Time)

The only place where the actual DICOM data in the files is linked to the pipeline is at the C-Store node at the start: this saves the files in a studyinstanceUID folder. Perhaps this linking is already basically wrong as what is being sent is not guaranteed to be a study

Sjoerd Kerkstra · Answer 5 · Mon Nov 09 2020 23:23:17 GMT+0800 (China Standard Time)

A very simple solution: When importing a collection into the pipeline, just add a few random characters. This will make the collection unique throughout its lifecycle in the pipeline

Sjoerd Kerkstra · Answer 6 · Fri Nov 13 2020 20:20:03 GMT+0800 (China Standard Time)

closed by ceb572d