sjoerdk / dicomtrolley

Retrieve medical images via WADO, MINT, RAD69 and DICOM-QR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance issue for small-chunk streaming

sjoerdk opened this issue · comments

  • dicomtrolley version: 2.1.4
  • Python version: 3.8
  • Operating System: ubuntu 22.04

Description

image

dicomtrolley shows an asymptotically declining download speed when the following conditions are met:

  • Download using rad69
  • A large study (1GB+)
  • The study should contain few, large slices. Enhanced dicom or large encapsulated pdf for example

The issue is not with the external server; curl performs fine
The issue is not with the requests library; a dressed down streaming loop performs fine

It is in dicomtrolley somewhere

Profiling the download shows the problem.

Way too much time is being spent in the split_off_first_part function. This part of the code scans the incoming chunks for the boundaries that delimit each part.

Rad69 always responds with a multi-part response where each part is either an xml document or a DICOM object. The response itself is streamed in chunks.

The incoming chunks are not aligned with part boundaries; the start of a boundary could be returned in chunk1 and the rest of it in chunk2. Therefore the code just uses the Bytes.find() method on all data received so far. here.

If chunks are small, say 32KB, and the incoming object is large, say 1GB, this means that Bytes.find() is called 32000+ times on an ever larger buffer.