microsoft / Simplify-Docx

Simplify DOCX files to JSON

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

lxml error

Palisand opened this issue · comments

Using lxml version 4.6.3, I get the following error when trying to simplify a document with a bulleted or numbered list:

.../lib/python3.9/site-packages/simplify_docx/__init__.py in simplify(doc, options)
     31     __set_options__(_options)
     32
---> 33     out = document(doc.element).to_json(doc, _options)
     34
     35     if _options.get("friendly-name", True):

.../lib/python3.9/site-packages/simplify_docx/elements/base.py in to_json(self, doc, options, super_iter)
    104         out.update({
    105                 "TYPE": self.__type__,
--> 106                 "VALUE": [ elt.to_json(doc, options) for elt in self],
    107                 })
    108         return out

.../lib/python3.9/site-packages/simplify_docx/elements/base.py in <listcomp>(.0)
    104         out.update({
    105                 "TYPE": self.__type__,
--> 106                 "VALUE": [ elt.to_json(doc, options) for elt in self],
    107                 })
    108         return out

.../lib/python3.9/site-packages/simplify_docx/elements/body.py in to_json(self, doc, options, super_iter)
     23         iter_me = peekable(self)
     24         for elt in iter_me:
---> 25             JSON = elt.to_json(doc, options, iter_me)
     26
     27             if (

.../lib/python3.9/site-packages/simplify_docx/elements/paragraph.py in to_json(self, doc, options, super_iter)
    165
    166         if options.get("include-paragraph-indent", True):
--> 167             _indent = get_paragraph_ind(self.fragment, doc)
    168             if _indent is not None:
    169                 out["style"] = {"indent": indentation(_indent).to_json(doc, options)}

.../lib/python3.9/site-packages/simplify_docx/utils/paragrapy_style.py in get_paragraph_ind(p, doc)
     54     num_style = get_num_style(p, doc)
     55     if num_style is not None and \
---> 56             num_style.pPr is not None and \
     57             num_style.pPr.ind is not None:
     58         return num_style.pPr.ind

AttributeError: 'lxml.etree._Element' object has no attribute 'pPr'

Can you share a minimal example? Hard to recapitulate an error without the inputs.

  1. Created a .docx file with the following contents:
  • foo
  • bar
  1. foo
  2. bar
  1. In your python environment, install lxml 4.6.3, simplify-docx, and the other dependencies specified in setup.py::setup.install_requires

  2. Run simplify(docx.Document("your-file.docx")) and observe that the error is raised

Your stacktrace will be different, but you should have the same error

AttributeError: 'lxml.etree._Element' object has no attribute 'pPr'

Here is a partial stack trace. The input .docx is not simple, and it was generated by Microsoft Word.

File "C:\Users\zoo42\AppData\Roaming\Python\Python39\site-packages\simplify_docx\elements\body.py", line 25, in to_json
    JSON = elt.to_json(doc, options, iter_me)
File "C:\Users\zoo42\AppData\Roaming\Python\Python39\site-packages\simplify_docx\elements\paragraph.py", line 167, in to_json
    _indent = get_paragraph_ind(self.fragment, doc)
File "C:\Users\zoo42\AppData\Roaming\Python\Python39\site-packages\simplify_docx\utils\paragrapy_style.py", line 56, in get_paragraph_ind
    num_style.pPr is not None and \
AttributeError: 'lxml.etree._Element' object has no attribute 'pPr'

I should do a bit of debugging.

@rleir I just accepted a PR that should deal with this. Can you let me know if this fix helped you? If so I'll close this issue.

(Sorry I don't have time to investigate myself. I'm not at all associated with the office team and I just maintain this in my free time, which is hard to come by lately...)