An educational tool showing how an evolutionary algorithm can be use to learn the structure of a Bayesian network.
The approach is heavily inspired by the example provided in the documentation of Halerium, where the causal structure is defined for the California School data set.
Learn a Bayesian Network purely from observational data. The resulting model should do well on predicting mathscr
, readscr
and testscr
.
This task can be broken down into the subtasks of:
- learning the causal structure → use a Genetic Algorithm
- learning the parameters → train Bayesian Network on data using Halerium for proposed proposed structure
a cross-section from 1998-1999
number of observations: 420
observation: schools
country: United States
distcod
: district codecounty
: countydistrict
: districtgrspan
: grade span of districtenrltot
: total enrollmentteachers
: number of teacherscalwpct
: percent qualifying for CalWORKSmealpct
: percent qualifying for reduced-price lunchcomputer
: number of computerstestscr
: average test score (read.scr+math.scr)/2compstu
: computer per studentexpnstu
: expenditure per studentstr
: student teacher ratioavginc
: district average incomeelpct
: percent of English learnersreadscr
: average reading scoremathscr
: average math score
California Department of Education https://www.cde.ca.gov.
- DEAP as framework for genetic algorithms
- HALerium to evaluate fitness of causal structure
- Streamlit for the web UI
All dependencies are stored in the requirements.txt
and can be installed using pip install -r requirements.txt
.
To run the web UI run streamlit run src/main.py
from to root folder of this project from to root folder of this project.
The execution of the genetic algorithm can be a fairly long-running process. Because of this, the UI does not support interactive updates on the progress of the run.
Instead, it's recommended to run the optimization process from the terminal by executing python src/ga_library.py
This is an educational tool intended as showcase how a Genetic Algorithm can be applied to a real problem.
It does not try to challenge the state-of-the-art nor does it use any novel operations.