pydicom / contrib-pydicom

contributions to the core pydicom base, including tutorials, extra plugins, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Faster numeric string. Useful for structure sets

colonelfazackerley opened this issue · comments

(First submitted here pydicom/pydicom#623. Now that is closed.)

This module has a couple of functions for speeding things up. The first is for IS and DS. The second is for structure sets specifically.

"""Faster handling of numeric strings.

Known to be useful for structure sets, which have a lot of DS.
"""
import pydicom
import numpy as np


def fast_num_string(parent_dataset, child_tag):
    """Returns a numpy array for decimal string or integer string values.

    parent_dataset: a pydicom Dataset
    child_tag:      a tag or keyword for the numeric string lookup
    """
    if type(child_tag) is str:
        child_tag = pydicom.datadict.tag_for_keyword(child_tag)
    val = super(parent_dataset.__class__,
                parent_dataset).__getitem__(child_tag).value
    vr = pydicom.datadict.dictionary_VR(child_tag)
    if vr == 'IS':
        np_dtype = 'i8'
    elif vr == 'DS':
        np_dtype = 'f8'
    else:
        raise ValueError("Must be IS or DS: {} is {}.".format(child_tag, vr))
    try:
        num_string = val.decode(encoding='utf-8')
        return np.fromstring(num_string, dtype=np_dtype, sep=chr(92))  # 92:'/'
    except AttributeError:  # 'MultiValue'  has no 'decode' (bytes does)
        # It's already been converted to doubles and cached
        return np.array(parent_dataset.__getitem__(child_tag), dtype=np_dtype)


def fast_structure_coordinates(contour):
    """Returns a list of numpy arrays. Each element in the list is a loop
    from the structure. Each loop is given as a numpy array where each column
    are the x, y, z coordinates of a point on the loop.

    contour: input an item from a structure set ROIContourSequence."""
    return [np.reshape(fast_num_string(loop, 'ContourData'),
                       (3, -1), order='F')
            for loop in contour.ContourSequence]

A separate module with example usage.

import timeit
import pydicom
import numpy as np
from fast_num_string import (fast_num_string, fast_structure_coordinates)

def normal_structure_coordinates(contour):
    return [np.reshape(loop.ContourData, (3, -1), order='F')
            for loop in contour.ContourSequence]

# read structure set file
rs = pydicom.dcmread('RS.gyn1.dcm')

# extract a decimal string
loop1 = fast_num_string(rs.ROIContourSequence[0].ContourSequence[0], 'ContourData')
# note this becomes a numpy float64
print(loop1.dtype)

# extract an integer string
rgb = fast_num_string(rs.ROIContourSequence[0], 'ROIDisplayColor')
# note this becomes a numpy int64
print("{}: {}".format(rgb.dtype, rgb))


# not a numeric string: get error
try:
    patname = fast_num_string(rs, pydicom.tag.Tag(0x10, 0x10))
except ValueError as err:
    print("{}: {}".format(err.__class__.__name__, err))

# extract all coordinates out and reshape into a convenient form
# using numpy.fromstring under the bonnet
tick = timeit.default_timer()
structs_np_fast = [fast_structure_coordinates(contour)
                   for contour in rs.ROIContourSequence
                   if hasattr(contour,'ContourSequence')] # filter out empty structures
print('fast_structure_coordinates: {:.1f}s'.format(timeit.default_timer() - tick))

# extract all coordinates out and reshape into a convenient form
# the obvious way
tick = timeit.default_timer()
structs_np_norm1 = [normal_structure_coordinates(contour)
                       for contour in rs.ROIContourSequence
                       if hasattr(contour,'ContourSequence')] # filter out empty structures
print('normal_structure_coordinates: {:.1f}s'.format(timeit.default_timer() - tick))

# extract all coordinates out and reshape into a convenient form
# the obvious way, the second time (when it's cached).
tick = timeit.default_timer()
structs_np_norm2 = [normal_structure_coordinates(contour)
                    for contour in rs.ROIContourSequence
                    if hasattr(contour,'ContourSequence')] # filter out empty structures
print('2nd time normal_structure_coordinates: {:.1f}s'.format(timeit.default_timer() - tick))

# doing it the fast way after the normal way, exercises an except block
tick = timeit.default_timer()
structs_np_fast2 = [fast_structure_coordinates(contour)
                   for contour in rs.ROIContourSequence
                   if hasattr(contour,'ContourSequence')] # filter out empty structures
print('fast_structure_coordinates after normal: {:.1f}s'.format(timeit.default_timer() - tick))

# check the answer is the same. converting to string is slow here.
assert(str(structs_np_fast) == str(structs_np_norm1))
assert(str(structs_np_fast2) == str(structs_np_norm1))

Output

float64
int64: [  0 255   0]
ValueError: Must be IS or DS: (0010, 0010) is PN.
fast_structure_coordinates: 1.9s
normal_structure_coordinates: 6.6s
2nd time normal_structure_coordinates: 0.7s
fast_structure_coordinates after normal: 0.5s

Happy to prepare a pull request. I dont understand how this should work. Would the 2 handy functions be in a contrib-pydicom module to be imported? Would they be copy-pasted into something else. Where does the example usage script go? Where do I put unit tests?

Hi, I missed responding to this in the pydicom issue, but this is quite intriguing. I think it might be able to go into pydicom proper, but we'd have to think through the consequences some more, so working it out in contrib is probably a good idea. IMO the question would be whether it can be integrated seamlessly into DS/IS and Multivalue types, including being tolerant of numpy not installed, and falling back to old behavior. Perhaps it would be available as a configuration option.

I'm wondering -- if the conversion from raw data element was replaced with this conversion (assuming numpy available) would it still pass all standard pydicom tests? I think it might pass most, because one should still be able to index or iterate through these arrays. But there might be some type checks (isinstance) in parts of the code that would fail. We'd have to explore this some more.

I have also been wondering for some time whether the natural type for all numeric arrays (not just pixel data) in pydicom could be numpy arrays (or even python arrays if 1-D). In the pydicom issue, @bastula also commented about that.

Looks like this has been dealt with in pydicom; closing the issue.