ascuet / test_web

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

test_web

2

3

Attribute Exploration

Counterfactual (CF)

According to Keane & Smyth (from ICCBR 2020) a Counterfactual of an instance i is an instance j that is highly similar to i but different in at least two value (in terms of attribute) and has a different output.

Creating a File for CounterFactual and Non-Counterfactual with Five Most Similar Variation

  1. The input file is data/scratch/preprocessed_case_base_2_da.csv which contains 288 cases
  2. To generate the CSV file, run analyze.py
  3. For each case (each row in the csv file), five most similar cases are extracted. Values that are different for a specific attribute are shown in the output file. In the following columns, the variations are highlighted Diff_Attributes: Most Similar Variation 1, Diff_Attributes: Most Similar Variation 2, ... ..., Diff_Attributes: Most Similar Variation 5. From these five columns, the attribute values that differ can be found. The output file is analyze_new_287_pg.csv located in the project root directory.

Creating a File for Rules Generation Using CounterFactual

  1. Run generate_acf.py which takes input the file from previous Section analyze_new_287_pg.csv

  2. For generating the rules, two attributes having different values are considered together

  3. In the code, for the current version, you need to manually set the attributes: For example: str1=Mission and str2=Risktol

  4. The code will output the following for the command >> python3 generate_acf.py:

    50   2.0  7.0  2.0  3.0 
    59   7.0  2.0  3.0  2.0 
    74   7.0  8.0  8.0  6.0 
    75   0.0  3.0  4.0  2.0 
    77   3.0  0.0  2.0  4.0 
    79   8.0  7.0  6.0  8.0 
    178  4.0  2.0  6.0  7.0 
    191  2.0  4.0  7.0  6.0
  5. Meaning of Columns:

    Here, the first column is the case no, second column is the actual value 'Mission' attribute, third column is the value of the similar case for 'Mission', fourth column is the actual value of the 'Risktol', fifth column is the value of the similar case for 'Risktol'.

  6. Copying Output and Creating Excel:

    Currently, copy the output from the command to an excel file. Similarly, check for the pairs: {'Mission', 'Denial'}, {'Mission', 'Timeurg'}, {'Denial', 'Risktol'} etc. The output is file Cf_Demo.xlsx. Then we format the file to have the difference as below:

    Case CF Mission
    Risktol Mission
    (Diff)
    Risktol
    (Diff)
    SIM(C,CF) Decision
    (Original)
    Decision
    (CF)
    50 59 2 7 2 3 5 1 0.96 11 10
    191 178 2 4 7 6 2 -1 0.97 6 0
    75 77 0 3 4 2 3 -2 0.96 10 9
    74 79 7 8 8 6 1 -2 0.98 11 10
    59 50 7 2 3 2 -5 -1 0.96 10 11
    77 75 3 0 2 4 -3 2 0.96 9 10
    79 74 8 7 6 8 -1 2 0.98 10 11
    178 191 4 2 6 7 -2 1 0.97 0 6
  7. Final Rules Creation:

    From this file, the rules are generated manually. The rules with all the final formatting is located in app/learn/rules.csv

    attr_1 attr_2 diff_1 diff_2
    mission risktol 5 1
    mission risktol 2 -1
    mission risktol 3 -2
    mission risktol 1 -2
    mission risktol -5 -1
    mission risktol -3 2
    mission risktol -1 2
    mission risktol -2 1
    There are total 52 rules in the list.

Creating a File for CounterFactual and Non-Counterfactual with Five Most Similar Variation

  1. Non-Counterfactual Cases Generation:

    Similarly, by running the generate_acf.py, you can generate a list of non-counterfactual cases. This line do the staffs:

     if "Non-Counterfactual" in variation1 and "Non-Counterfactual" in variation2 and "Non-Counterfactual" in variation3 and "Non-Counterfactual" in variation4 and "Non-Counterfactual" in variation5:
          print(case)

    For current version, you have to copy the list in a csv file. Currently, it's named non_cf.csv, located in the root directory.

  2. Two Separate Files Generation:

    Then run generate_new_cases.py, which will do the followings:

    (a) Read three files: (i) Non-Counterfactuals (non_cf.csv), (ii) Rules (app/learn/rules.csv), and (iii) Original Cases (app/learn/casebase2_without_da.csv). (b) Loop each original cases (c) If the case is in the list of non-counterfactuals, then

    • read the KDMA vales (mission, denial, risktol, timeurg) for each case
    • apply each rule to this case by adding the values of the rules with the respective attributes of the original case. Each row will be a candiate for the case base.
    • if the new values of the attributes of this new case is between 0 and 10 (inclusive), then add this new case to the original cases.
  3. Combined Casebase and Casebase Separation:

    The three new output files are: new_cases_within_range.csv, cases_actual_cf.csv, and cases_no_actual_cf.csv, located in the root directory. In the new_cases_within_range.csv file, first 288 cases are from the original casebase. The new rows generated by using the rules were appended after these 288 rows.

About