pgmpy / pgmpy

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Home Page:https://pgmpy.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

XMLBIFReader fails parsing moderately large networks

systemssoperfect opened this issue · comments

Subject of the issue

XMLBIFReader fails parsing moderately large networks, with tables ~1M entries.

Your environment

  • pgmpy 0.1.22
  • Python 3.8.10
  • Ubuntu 20.04

Steps to reproduce

Create a node TABLE with more than 1 million entries

Expected behaviour

No error.

Actual behaviour

lxml throws the exception below:

Traceback (most recent call last):
File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
File "./Inferências/Modelos/Modelo5_BIF_2022_2t.xml", line 5765
lxml.etree.XMLSyntaxError: xmlSAX2Characters: huge text node, line 5765, column 10000791

@systemssoperfect Thanks for reporting this. Would it be by any chance, possible to share the model file so that I can reproduce the issue?

@systemssoperfect Thanks for sharing the file. I hadn't realized that lxml's parser is much slower than xml.etree. I have changed everything to work with xml.etree now.