Serialized XML objects in small molecule pipeline have the wrong extension
ijpulidos opened this issue · comments
When we serialize objects in
perses/perses/app/setup_relative_calculation.py
Lines 733 to 735 in 4e36a6b
They are supposed to be gzipped files, but while inspecting these files one can easily tell they are directly the XML files. As in:
❯ file *
complex-hybrid-system.gz: XML 1.0 document, ASCII text
complex-new-system.gz: XML 1.0 document, ASCII text
complex-old-system.gz: XML 1.0 document, ASCII text
solvent-hybrid-system.gz: XML 1.0 document, ASCII text
solvent-new-system.gz: XML 1.0 document, ASCII text
solvent-old-system.gz: XML 1.0 document, ASCII text
Instead of the expected "gzip compressed data"
.
Do we want to have them zipped, or do we want to keep them uncompressed?
We do want to compress them.
Okay this was a fun one: https://github.com/choderalab/perses/blob/main/perses/utils/data.py#L114-L127
We do save the xml with gzip, but then since there is an if
instead of elif
when checking for bz2
, we overwrite the file with an uncompressed version when we hit the else
block. I've got a PR incoming with some extra debug that will help troubleshoot issues like this.