According to Keane & Smyth (from ICCBR 2020) a Counterfactual of an instance i is an instance j that is highly similar to i but different in at least two value (in terms of attribute) and has a different output.
- The input file is
data/scratch/preprocessed_case_base_2_da.csv
which contains 288 cases - To generate the CSV file, run
analyze.py
- For each case (each row in the csv file), five most similar cases are extracted. Values that are different for a specific attribute are shown in the output file. In the following columns, the variations are highlighted
Diff_Attributes: Most Similar Variation 1
,Diff_Attributes: Most Similar Variation 2
,... ...
,Diff_Attributes: Most Similar Variation 5
. From these five columns, the attribute values that differ can be found. The output file isanalyze_new_287_pg.csv
located in the project root directory.
-
Run
generate_acf.py
which takes input the file from previous Sectionanalyze_new_287_pg.csv
-
For generating the rules, two attributes having different values are considered together
-
In the code, for the current version, you need to manually set the attributes: For example:
str1=Mission
andstr2=Risktol
-
The code will output the following for the command
>> python3 generate_acf.py
:50 2.0 7.0 2.0 3.0 59 7.0 2.0 3.0 2.0 74 7.0 8.0 8.0 6.0 75 0.0 3.0 4.0 2.0 77 3.0 0.0 2.0 4.0 79 8.0 7.0 6.0 8.0 178 4.0 2.0 6.0 7.0 191 2.0 4.0 7.0 6.0
-
Meaning of Columns:
Here, the first column is the case no, second column is the actual value 'Mission' attribute, third column is the value of the similar case for 'Mission', fourth column is the actual value of the 'Risktol', fifth column is the value of the similar case for 'Risktol'.
-
Copying Output and Creating Excel:
Currently, copy the output from the command to an excel file. Similarly, check for the pairs: {'Mission', 'Denial'}, {'Mission', 'Timeurg'}, {'Denial', 'Risktol'} etc. The output is file
Cf_Demo.xlsx
. Then we format the file to have the difference as below:Case CF Mission Risktol Mission
(Diff)Risktol
(Diff)SIM(C,CF) Decision
(Original)Decision
(CF)50 59 2 7 2 3 5 1 0.96 11 10 191 178 2 4 7 6 2 -1 0.97 6 0 75 77 0 3 4 2 3 -2 0.96 10 9 74 79 7 8 8 6 1 -2 0.98 11 10 59 50 7 2 3 2 -5 -1 0.96 10 11 77 75 3 0 2 4 -3 2 0.96 9 10 79 74 8 7 6 8 -1 2 0.98 10 11 178 191 4 2 6 7 -2 1 0.97 0 6 -
Final Rules Creation:
From this file, the rules are generated manually. The rules with all the final formatting is located in
app/learn/rules.csv
attr_1 attr_2 diff_1 diff_2 mission risktol 5 1 mission risktol 2 -1 mission risktol 3 -2 mission risktol 1 -2 mission risktol -5 -1 mission risktol -3 2 mission risktol -1 2 mission risktol -2 1
-
Non-Counterfactual Cases Generation:
Similarly, by running the
generate_acf.py
, you can generate a list of non-counterfactual cases. This line do the staffs:if "Non-Counterfactual" in variation1 and "Non-Counterfactual" in variation2 and "Non-Counterfactual" in variation3 and "Non-Counterfactual" in variation4 and "Non-Counterfactual" in variation5: print(case)
For current version, you have to copy the list in a csv file. Currently, it's named
non_cf.csv
, located in the root directory. -
Two Separate Files Generation:
Then run
generate_new_cases.py
, which will do the followings:(a) Read three files: (i) Non-Counterfactuals
(non_cf.csv)
, (ii) Rules(app/learn/rules.csv)
, and (iii) Original Cases(app/learn/casebase2_without_da.csv)
. (b) Loop each original cases (c) If the case is in the list of non-counterfactuals, then- read the KDMA vales (mission, denial, risktol, timeurg) for each case
- apply each rule to this case by adding the values of the rules with the respective attributes of the original case. Each row will be a candiate for the case base.
- if the new values of the attributes of this new case is between 0 and 10 (inclusive), then add this new case to the original cases.
-
Combined Casebase and Casebase Separation:
The three new output files are:
new_cases_within_range.csv
,cases_actual_cf.csv
, andcases_no_actual_cf.csv
, located in the root directory. In thenew_cases_within_range.csv
file, first 288 cases are from the original casebase. The new rows generated by using the rules were appended after these 288 rows.