pgmpy / pgmpy

Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

Home Page:https://pgmpy.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Change in column names changes the outcome of HillClimbSearch

wakidal opened this issue · comments

commented

Subject of the issue

HillClimbSearch() returns different outputs from same datasets with different column names: one dataset with English and another one with Japanese font.
Why this change is happening and what is the logic behind it?
Is there any ways to controll it?

environment

  • Python 3.10.2
  • pgmpy 0.1.19
  • Windows
from pgmpy.estimators import (
    HillClimbSearch,
    BicScore)
# creating test data
data = pd.DataFrame(np.random.randint(0, 4, size=(5000, 6)), columns=['X1', 'X2', 'X3','X4','X5','X6'])
data['X1'] = data["X2"] + data["X3"]
data["X4"] = data["X5"] + data["X6"]

# rename col-names with japanese-font
data2 =  data.rename(columns={'X1':'いち_X1','X2':'に_X2','X3':'さん_X3','X4':'よん_X4','X5':'ご_X5','X6':'ろく_X6'})

def do_HCS(x):
    HC1 = HillClimbSearch(x)
    network = HC1.estimate(scoring_method=BicScore(x)) 
    return network.edges()

do_HCS(data)
#[('X2', 'X1'), ('X3', 'X1'), ('X5', 'X4'), ('X6', 'X4')]

do_HCS(data2)
#OutEdgeView([('に_X2', 'いち_X1'), ('さん_X3', 'いち_X1'), ('よん_X4', 'ご_X5'), ('ろく_X6', 'ご_X5'), ('ろく_X6', 'よん_X4')])

@wakidal Thanks for reporting this but I am not able to reproduce the issue. On my machine, it gives the same result.

In [16]: do_HCS(data)
    ...: 
  0%|                                                                                                                                                   | 4/1000000 [00:00<7:16:25, 38.19it/s]
Out[16]: OutEdgeView([('X2', 'X1'), ('X3', 'X1'), ('X5', 'X4'), ('X6', 'X4')])

In [17]: do_HCS(data2)
    ...: 
  0%|                                                                                                                                                   | 4/1000000 [00:00<7:31:00, 36.95it/s]
Out[17]: OutEdgeView([('に_X2', 'いち_X1'), ('さん_X3', 'いち_X1'), ('ご_X5', 'よん_X4'), ('ろく_X6', 'よん_X4')])

I can confirm, even i am not able to reproduce the issue