huawei-noah / trustworthyAI

Hi, I want to know the data requirement for causal discovery in the web GUI. The following data1 & data3 are my local dataset's data format. Data2 is the data format generated by the data generation task.
I can successfully use causal discovery when I choose data1 from the external training dataset and data2 from the built-in training dataset.
When I choose data3 from the external training dataset, it always prompts "校验结果为false" in the web GUI, and the terminal shows "POST /task/check_dataset HTTP/1.1" 200 -". When I download the file of data2 to to my local, then choose data2 from the external training dataset, the WEB GUI and terminal show the same fail text.

data1:[0 1 0 0 0 0 0 1 0 0 0 0 0 -1]
data2:[-0.516906642 0.498168803 -0.228214563 0.752834357 0.592922701]
data3:[0.363636364 0 0.090909091 0 0 0 0.090909091 0 0.090909091 0.363636364 0 0 0 -3]

Hello,

I believe this could be a bug in how the data format is checked here:

trustworthyAI/gcastle/web/models/base_class.py

Lines 102 to 104 in 4e46417

    
           data_type = str(data_df.dtypes.unique()[0]) 
        
           if len(data_df.dtypes.unique()) == 1 and ('float' in data_type or 'int' in data_type): 
        
               return data_df.shape[1]

The current code version only allows for a single data type in the dataframe (either float or int), not both at the same time.
For now, you can try to change the data type of all data to float and it should work. For example, you can add .0 to the first numbers in any column that is of int type.

We will update the code for the next version of the package to allow a mix of int and float columns.

A fix for this issue has now been added. Using a mix of int and float columns should now work as expected.

	data_type = str(data_df.dtypes.unique()[0])
	if len(data_df.dtypes.unique()) == 1 and ('float' in data_type or 'int' in data_type):
	return data_df.shape[1]

the data of causal discovery