XMLBIFReader fails parsing moderately large networks
systemssoperfect opened this issue · comments
Subject of the issue
XMLBIFReader fails parsing moderately large networks, with tables ~1M entries.
Your environment
- pgmpy 0.1.22
- Python 3.8.10
- Ubuntu 20.04
Steps to reproduce
Create a node TABLE with more than 1 million entries
Expected behaviour
No error.
Actual behaviour
lxml throws the exception below:
Traceback (most recent call last):
File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
File "./Inferências/Modelos/Modelo5_BIF_2022_2t.xml", line 5765
lxml.etree.XMLSyntaxError: xmlSAX2Characters: huge text node, line 5765, column 10000791
@systemssoperfect Thanks for reporting this. Would it be by any chance, possible to share the model file so that I can reproduce the issue?
Here we go.
@systemssoperfect Thanks for sharing the file. I hadn't realized that lxml's parser is much slower than xml.etree. I have changed everything to work with xml.etree now.