johnHostetter / soft-computing-pyrenees-integration

An old and out-dated integration between my personal soft computing library and an intelligent tutoring system called Pyrenees.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Integration of the Soft Computing Library to Pyrenees

Repository Name: pyrenees_soft_integration

Setup procedure

This project allows for the integration between the Pyrenees code, and the Soft Computing library that has been developed by Hostetter. It essentially acts as a liaison between the two repositories, to simplify the utilization of soft computing algorithms in Pyrenees experiments.

To use this project code, a few steps must be followed during the installation. These steps are assuming that the Pyrenees-Python project has absolutely no Git submodule or copy of this project, or this project's dependencies (e.g. the soft_computing submodule).

First, add this GitHub repository (pyrenees_soft_integration) as a Git submodule to the Pyrenees GitHub repository (Pyrenees-python at the time of writing). Specifically, it should be added in the following path:Pyrenees-python/app/

To add this GitHub repository as a Git submodule, while in a terminal at the above path, type: git submodule add [remote url] where [remote url] is this GitHub repository's HTTPS link.

There should now be a directory located at: Pyrenees-python/app/pyrenees_soft_integration. However, the soft computing algorithms are not yet ready to be used. Specifically, you will encounter import errors from the files contained within the soft_computing submodule (e.g. the files in the fuzzy directory).

To finish the setup and integration process, we need to consult the setup procedure outlined in the soft_computing GitHub repository. However, the most relevant instructions are included here. Particularly, we need to create a Python virtual environment, or activate an existing Python virtual environment. Then, from within the soft_computing directory, we follow the pip installation steps, but only need to run:
pip install -e .

The code should now be ready to use. We can easily import any code from this GitHub repository (pyrenees_soft_integration), and its subdirectories, into Pyrenees-python/app/routes.py (which is where pedagogical decision making occurs), by following the same convention that is used for libraries such as Numpy or Pandas (e.g. import numpy as np).

Troubleshooting

Upon following the above setup procedure, it is possible one may need to conduct troubleshooting. Particularly, one issue I have encountered in adding this GitHub repository to the Pyrenees-python project is that the submodules contained within this project (pyrenees_soft_integration) would have no files (i.e. the soft_computing folder was empty). This requires the following three-step fix:

  1. Open a terminal in this root directory (i.e. ls in the terminal will show the soft_computing folder).
  2. In the terminal, populate the .git config by typing:
    git submodule init
  3. Finally, to populate the submodules with the code, type:
    git submodule update --recursive

In the event you need to remove a submodule from this repository, follow these steps:

  1. Delete the relevant section from the .gitmodules file.
  2. Stage the .gitmodules changes:
    git add .gitmodules
  3. Delete the relevant section from .git/config
  4. Run (with no trailing slash):
    git rm --cached path_to_submodule
  5. Run (with no trailing slash):
    rm -rf .git/modules/path_to_submodule
  6. Commit:
    git commit -m "Removed submodule"
  7. Delete the now untracked submodule files:
    rm -rf path_to_submodule

If you have changes made to the submodule you would like to receive (i.e., new code available at origin repository), refer to the following git command:
git submodule foreach git pull origin master

However, if you need to reset the git submodules, refer to this. For convenience, here is Method 1:
git submodule foreach --recursive git reset --hard
and if that doesn't work, here are the commands for Method 2:
git submodule deinit -f .
git submodule update --init --recursive

This GitHub blog about working with submodules is a nice resource about submodules.

Explanation of scripts and files

The following scripts are of primary interest:

problem_level_policy_induction.py step_level_policy_induction.py

The first script will induce a problem-level policy for Pyrenees. The second script will induce a separate step-level policy for each problem ID in Pyrenees.

The constant.py script file contains a few constants of interest. Specifically, only the following are used:

PROBLEM_LIST
PROBLEM_FEATURES
STEP_FEATURES

The PROBLEM_LIST is a list that contains the problem IDs stored as strings. Some problem IDs have no pedagogical intervention, and as such, there will be no decision information on these, as every student receives the same intervention (e.g. the first "problem" is always worked-example for every student). As such, there are some try-catch blocks in the code meant to handle this scenario when the provided problem ID has no corresponding decision information.

The PROBLEM_FEATURES is a list that contains the features stored as strings, that are only available for problem-level decisions. These values correspond directly to the column names of the provided .csv files containing the problem-level/step-level decision information.

Similarly, STEP_FEATURES is a list that contains the features stored as strings, that are only available for step-level decisions. PROBLEM_FEATURES is not the same list as STEP_FEATURES. Specifically, PROBLEM_FEATURES has fewer features (only 130 features, where STEP_FEATURES is 142 features).

The other constants found in the script file such as MEDIAN_THRESHOLD_LTR or MEDIAN_THRESHOLD_STR_POSITIVE are legacy constants that are used by Song's Critical HRL. They are kept as a precautionary measure in case they are needed once more.

The preprocessing.py script file contains a few functions of interest. Specifically, the following:

undo_normalization()
encode_action()
policy_features()
inferred_reward_constant()
build_traces()

The undo_normalization() function will take the provided data, and reverse any normalization that has been applied to it. However, it does not check that the data has been normalized. This function exists since the original training data provided by Song was normalized, and in a prior study, the normalization had to be undone.

The encode_action() function will take the policy type that is being induced (e.g. problem-level or step-level), and encode the action as it was saved originally (in a string format) as an integer. Refer to the documentation for more details.

The policy_features() function is a simple function that exists in order to allow the build_traces function to be independent of the policy type. This policy_features() function simply returns the features required for the provided policy type.

The inferred_reward_constant() function is a simple function that exists in order to allow the build_traces() function to be independent of the policy type. This inferred_reward_constant() function simply returns the constant required for the provided policy type. Specifically, the constant returned is multiplied by the inferred immediate reward at this decision point.

The build_traces() function returns the provided data as a list where each element has the form:

(state, action, reward, next state, done)

This representation is required for the reinforcement learning algorithms to be applied.

The soft_computing directory is a Git submodule. To add a Git submodule (at the time of writing), use the following command:

git submodule add [remote url]

where [remote url] is the URL to your remote GitHub repository.

Git submodules have their HEAD pointer frozen to when they were added, so they will not be automatically be updated if the remote repository receives updates. To update the Git submodule(s), the submodule(s) can be updated in a testing branch with the following command:

git submodule update

If nothing is negatively affected, the changes/fixes can then be merged to the main branch.

For the original post advocating for Git submodules, see here.

About

An old and out-dated integration between my personal soft computing library and an intelligent tutoring system called Pyrenees.


Languages

Language:Python 100.0%