microsoft / Simplify-Docx

Simplify DOCX files to JSON

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'lxml.etree._Element' object has no attribute 'pPr'

lavishsaluja opened this issue · comments

I'm trying to read this file: diary.docx, and getting AttributeError on line 56 of paragraph style

I tried to print different variables and num_style inside get_paragraph_ind is being returned as NoneType by the function get_num_style even when the p.pPr value is not null so most probably it's not able to find any subElement for abstractNumbering

Will be great if someone can help me resolve this 🙏

I've attached the docx file I'm using and a screenshot of the Error message below:

Screenshot 2020-08-31 at 1 10 00 AM

forgot to add my code snippet, adding it here

import docx
from simplify_docx import simplify

doc = docx.Document("/Users/lavishsaluja/Downloads/diary.docx")

json_doc = simplify(doc)

Can confirm this issue is happening to me as well

OS: Mac Catalina 10.15.5
Word Version: 16.16.25

Steps to replicate:

  1. Create a new document in Word
  2. Add numbering to any paragraph
  3. Save document
    (example attached)
    Sample Document.docx

Issue seems to start here:

num_style = get_num_style(p, doc)

In particular, it seems that although
num_style is not None,
the element doesn't have a pPr attribute.

If you bypass:

if num_style is not None and \
num_style.pPr is not None and \
num_style.pPr.ind is not None:
return num_style.pPr.ind

(e.g.)

    if num_style is not None:
        try:
            if num_style.pPr is not None and \
            num_style.pPr.ind is not None:
                return num_style.pPr.ind
        except AttributeError:
            return None

Then the script runs without error, although style-related information doesn't get pulled in...

hi, this is still happening to me, can i know if theres something im doing wrongly?