pgmpy / pgmpy

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Home Page:https://pgmpy.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Belief Propagation consumes exponential memory on efficient Factor Graphs (Trees)

tomsch420 opened this issue · comments

Subject of the issue

Belief Propagation tries to allocate exponential memory on factor trees. This shouldnt be happening since trees are still without cycles and therefore efficient for exact marginalization.

Your environment

  • pgmpy version 0.1.19
  • Python version 3.8.10
  • Operating System Ubuntu 20.04

Steps to reproduce

import pgmpy.models
import numpy as np
import pgmpy.factors.discrete
import networkx
import matplotlib.pyplot as plt
from networkx.drawing.nx_agraph import graphviz_layout
import pgmpy.inference


# parameters for the example. Adjust this to see when it breaks
number_of_timesteps = 20
cardinality_of_timesteps = 5

# initialize random transition model
transition_model = np.random.uniform(low=0., high=1., size=(pow(cardinality_of_timesteps,2),))
transition_model /= sum(transition_model)


# create factorgraph
factor_graph = pgmpy.models.FactorGraph()

# add variable nodes for timesteps
timesteps = ["t%s" % t for t in range(number_of_timesteps)]
factor_graph.add_nodes_from(timesteps)

# create transition factors
factors = []

# for each transition
for idx in range(len(timesteps)-1):

    # get the variable names
    state_names = {"t%s" % idx: list(range(cardinality_of_timesteps)),
                    "t%s" % (idx+1): list(range(cardinality_of_timesteps))}

    # create factor with values from transition model
    factor = pgmpy.factors.discrete.DiscreteFactor(list(state_names.keys()),
                                                    [cardinality_of_timesteps,
                                                    cardinality_of_timesteps], 
                                                    transition_model, 
                                                    state_names)
    factors.append(factor)

# add factors
factor_graph.add_factors(*factors)

# add edges for state variables and transition variables
for idx, factor in enumerate(factors):
    factor_graph.add_edges_from([("t%s" % idx, factor),
                                    (factor, "t%s" % (idx+1))])

# create prior factors
for timestep in timesteps:

    # create values of current variable
    state_names = {timestep: list(range(cardinality_of_timesteps))}

    # create random evidence
    evidence = np.random.uniform(low=0.1, high=1., size=(cardinality_of_timesteps,))
    evidence /= sum(evidence)

    # create a factor from it
    factor = pgmpy.factors.discrete.DiscreteFactor([timestep], [cardinality_of_timesteps],
                                                    evidence, state_names)

    # add factor and edge from variable to prior
    factor_graph.add_factors(factor)
    factor_graph.add_edge(timestep, factor)


# plot the structure
g = networkx.Graph(factor_graph.edges())
pos = graphviz_layout(g, prog='dot')
networkx.draw(g, pos, with_labels=False, arrows=False)
# plt.show()

bp = pgmpy.inference.BeliefPropagation(factor_graph)
bp.calibrate()
independent_marginals = bp.query(factor_graph.get_variable_nodes(), joint=False)
print(independent_marginals)

Expected behaviour

Calculation of the independent marginals in polynomial time and memory consumption.

Actual behaviour

Trying to allocate exponential big memory
numpy.core._exceptions.MemoryError: Unable to allocate 45.5 GiB for an array with shape (5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5) and data type float64