Change in column names changes the outcome of HillClimbSearch

Question

Change in column names changes the outcome of HillClimbSearch

wakidal opened this issue a year ago · comments

Subject of the issue

HillClimbSearch() returns different outputs from same datasets with different column names: one dataset with English and another one with Japanese font.
Why this change is happening and what is the logic behind it?
Is there any ways to controll it?

environment

Python 3.10.2
pgmpy 0.1.19
Windows

from pgmpy.estimators import (
    HillClimbSearch,
    BicScore)
# creating test data
data = pd.DataFrame(np.random.randint(0, 4, size=(5000, 6)), columns=['X1', 'X2', 'X3','X4','X5','X6'])
data['X1'] = data["X2"] + data["X3"]
data["X4"] = data["X5"] + data["X6"]

# rename col-names with japanese-font
data2 =  data.rename(columns={'X1':'いち_X1','X2':'に_X2','X3':'さん_X3','X4':'よん_X4','X5':'ご_X5','X6':'ろく_X6'})

def do_HCS(x):
    HC1 = HillClimbSearch(x)
    network = HC1.estimate(scoring_method=BicScore(x)) 
    return network.edges()

do_HCS(data)
#[('X2', 'X1'), ('X3', 'X1'), ('X5', 'X4'), ('X6', 'X4')]

do_HCS(data2)
#OutEdgeView([('に_X2', 'いち_X1'), ('さん_X3', 'いち_X1'), ('よん_X4', 'ご_X5'), ('ろく_X6', 'ご_X5'), ('ろく_X6', 'よん_X4')])

Ankur Ankan · Answer 1 · Sat Feb 11 2023 00:18:37 GMT+0800 (China Standard Time)

@wakidal Thanks for reporting this but I am not able to reproduce the issue. On my machine, it gives the same result.

In [16]: do_HCS(data)
    ...: 
  0%|                                                                                                                                                   | 4/1000000 [00:00<7:16:25, 38.19it/s]
Out[16]: OutEdgeView([('X2', 'X1'), ('X3', 'X1'), ('X5', 'X4'), ('X6', 'X4')])

In [17]: do_HCS(data2)
    ...: 
  0%|                                                                                                                                                   | 4/1000000 [00:00<7:31:00, 36.95it/s]
Out[17]: OutEdgeView([('に_X2', 'いち_X1'), ('さん_X3', 'いち_X1'), ('ご_X5', 'よん_X4'), ('ろく_X6', 'よん_X4')])

Sahil Patki · Answer 2 · Thu Feb 16 2023 22:05:08 GMT+0800 (China Standard Time)

I can confirm, even i am not able to reproduce the issue