microsoft / Simplify-Docx

Simplify DOCX files to JSON

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

include-paragraph-indent with float indent

yunake opened this issue · comments

At least with Google Spreadsheets, paragraph indent can be float, which causes a crash:

Traceback (most recent call last):
  File "/Users/yunake/Documents/ТРО/audit_moves/./read_pryznachennya.py", line 8, in <module>
    my_doc_as_json = simplify(my_doc)
  File "/Users/yunake/Documents/ТРО/audit_moves/.venv/lib/python3.10/site-packages/simplify_docx/__init__.py", line 33, in simplify
    out = document(doc.element).to_json(doc, _options)
  File "/Users/yunake/Documents/ТРО/audit_moves/.venv/lib/python3.10/site-packages/simplify_docx/elements/base.py", line 106, in to_json
    "VALUE": [ elt.to_json(doc, options) for elt in self],
  File "/Users/yunake/Documents/ТРО/audit_moves/.venv/lib/python3.10/site-packages/simplify_docx/elements/base.py", line 106, in <listcomp>
    "VALUE": [ elt.to_json(doc, options) for elt in self],
  File "/Users/yunake/Documents/ТРО/audit_moves/.venv/lib/python3.10/site-packages/simplify_docx/elements/body.py", line 25, in to_json
    JSON = elt.to_json(doc, options, iter_me)
  File "/Users/yunake/Documents/ТРО/audit_moves/.venv/lib/python3.10/site-packages/simplify_docx/elements/paragraph.py", line 181, in to_json
    out["style"] = {"indent": indentation(_indent).to_json(doc, options)}
  File "/Users/yunake/Documents/ТРО/audit_moves/.venv/lib/python3.10/site-packages/simplify_docx/elements/base.py", line 36, in __init__
    self.props[prop] = getattr(x, prop)
  File "/Users/yunake/Documents/ТРО/audit_moves/.venv/lib/python3.10/site-packages/docx/oxml/xmlchemy.py", line 164, in get_attr_value
    return self._simple_type.from_xml(attr_str_value)
  File "/Users/yunake/Documents/ТРО/audit_moves/.venv/lib/python3.10/site-packages/docx/oxml/simpletypes.py", line 21, in from_xml
    return cls.convert_from_xml(str_value)
  File "/Users/yunake/Documents/ТРО/audit_moves/.venv/lib/python3.10/site-packages/docx/oxml/simpletypes.py", line 335, in convert_from_xml
    return Twips(int(str_value))
ValueError: invalid literal for int() with base 10: '4960.6299212598415'

I think that's technically out of spec, and you'll notice that the error was actually raised in the python-docx library and not this one. I suspect that the docx library won't support a fix since google sheets is out of spec but you could raise an issue with them if you like. (Note that I'm not at all associated with the python-docx library or the MS Office Team)

If you want to fix this in your local copy, you can install the python-docx library locally in edit mode and make the change yourself:

git clone https://github.com/python-openxml/python-docx
pip install -e ./python-docx

then make the change return Twips(int(str_value)) to return Twips(float(str_value)) in the python-docx/docx/oxml/simpletypes.py