Private tag sequence reads as 'UN' array
srodney opened this issue · comments
Thank you for your work on this excellent package!
After saving a dicom file that contains a private tag sequence, when that sequence is read back in as part of a Dataset, it is read as VR='UN' and an array of bytes, rather than a Sequence of Datasets with DataElements.
This appears identical to the issue described in #1336
Expected behavior
A private sequence written to a .dcm file should be read in as a Sequence.
Steps To Reproduce
See the attached code. I used the example code from the 2.4.2 documentation for creating a minimum dataset from scratch.
pydicom_private_sequence_reads_as_UN_array.ipynb.zip
Your environment
module | version |
---|---|
platform | macOS-14.0-x86_64-i386-64bit |
Python | 3.10.6 (v3.10.6:9c7b4bd164, Aug 1 2022, 17:13:48) [Clang 13.0.0 (clang-1300.0.29.30)] |
pydicom | 2.4.2 |
gdcm | module not found |
jpeg_ls | module not found |
numpy | 1.23.2 |
PIL | 10.0.0 |
pylibjpeg | module not found |
openjpeg | module not found |
libjpeg | module not found |
Here's an excerpt from the jupyter notebook attached above, showing the essential steps to reproduce on any valid dicom dataset
# given any valid dicom Dataset ds:
# Add a private sequence with a single item of valid dicom
test_seq_item = pydicom.dataset.Dataset()
test_seq_item.BlockType = "APERTURE"
test_seq_item.BlockName = "Block1"
test_seq = pydicom.sequence.Sequence([test_seq_item])
ds.add_new( 0x37770010, 'LO', 'TEST_CREATOR') # Private Creator
ds.add_new( 0x37771000, 'SQ', test_seq )
# Write it out to disk in the same location as before
ds.save_as( filename_little_endian, write_like_original=False )
# read it in again. See the private sequence as 'UN'
ds_after_read = pydicom.dcmread( filename_little_endian )
print(ds_after_read)
Ah, wait = I just discovered that this problem disappears if I write the dataset as BIG_ENDIAN and EXPLICIT_VR.
so I'm perhaps stumbling into something that was addressed in #1067 or #1305 and #1323
so I think this is very likely invalid, and I just need to read those issues and their resolutions more carefully.
Just as an aside: private tags should be set using private blocks.
Also, don't use the big endian transfer syntax for writing - this is long retired. But of course, if you write the data as little endian, the VR information gets lost, so the behavior is somewhat expected, and can be fixed by writing as explicit endian instead.
As you mentioned, #1323 should have fixed this for unknown sequences with unknown length. There is no way to know the VR of a private tag saved as UN with known length, if is not registered in the private dictionary. You could register it yourself in your case if you need to write the data as implicit VR.
Thanks very much. At least in preliminary testing, it looks like writing as explicit and using private blocks will solve the issue Closing this, with thanks!