microsoft / Simplify-Docx

Simplify DOCX files to JSON

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

something wrong with big docx file

wuyangjian opened this issue · comments

Hey, I got an issue, I don't know what is my problem
My code works well, when my docx file is small, but when I change to big file, then I got the error below:

Traceback (most recent call last):
File "/home/wuyangjian/demo.py", line 123, in
extract_docx_to_excel(path)
File "/home/wuyangjian/demo.py", line 32, in extract_docx_to_excel
for i in extract_docx(path):
File "/home/wuyangjian/demo.py", line 9, in extract_docx
my_doc_as_json = simplify(my_doc)
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/init.py", line 33, in simplify
out = document(doc.element).to_json(doc, _options)
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/elements/base.py", line 106, in to_json
"VALUE": [ elt.to_json(doc, options) for elt in self],
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/elements/base.py", line 106, in
"VALUE": [ elt.to_json(doc, options) for elt in self],
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/elements/body.py", line 25, in to_json
JSON = elt.to_json(doc, options, iter_me)
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/elements/paragraph.py", line 142, in to_json
out: Dict[str, Any] = super(paragraph, self).to_json(doc, options, super_iter)
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/elements/paragraph.py", line 27, in to_json
for elt in run_iterator:
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/elements/base.py", line 61, in iter
for elt in xml_iter(node,
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/iterators/generic.py", line 167, in xml_iter
for elt in xml_iter(current, handlers.TAGS_TO_NEST[current.tag], _msg):
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/iterators/generic.py", line 156, in xml_iter
yield handlers.TAGS_TO_YIELDcurrent.tag
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/elements/form.py", line 106, in init
super(fldChar, self).init(x)
File "/home/wuyangjian/miniconda3/lib/python3.9/site-packages/simplify_docx/elements/base.py", line 36, in init
self.props[prop] = getattr(x, prop)
AttributeError: 'lxml.etree._Element' object has no attribute 'fldCharType'

This was fixed a while ago here on github, but I didn't push the fix to pypi (where you probably downloaded it via pip) until just a moment ago. This fix should be available in version 0.1.2

I'm going to close this, assuming that it was fixed in the code pushed to pypi yesterday, but feel free to reopen if the new version doesn't work for you

commented

hello @jdthorpe i'm using version 0.1.2 but still facing same error, is there any updates ?