dzirbel / ka-infection

An implementation of the Infection challenge for Khan Academy's application process

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

An implentation of the Infection project for Khan Academy's application process. The official description of the parameters can be found here. In brief, the challenge is to implement a graph of students and teachers who can be "infected" classroom-by-classroom by a new version of a piece of software. Classrooms are infected as a whole to make sure classmates are using the same version, and in the Limited Infection case, the goal is to infect only a specified portion of the population.

Usage

Run total_infection.py to execute randomized correctness tests for the Total Infection scenario. (Use -h to see command-line options.) These tests simply create randomly-sized classrooms and then infect and/or disinfect them, checking along the way that the infection has spread properly.

Run limited_infection.py to execute randomized correctness tests for the Limited Infection scenario. (Use -h to see command-line options.) These tests construct randomly-sized classrooms and then use the close_enough algorithm (by default) to search for approximate solutions to infect a random percent of the userbase. Prints the percent of these scenarios for which an acceptable solution was found and an average accuracy, which is the discrepancy between the target and the solution, as a percentage of the target (i.e. 10% means the solutions are +/- 10% of the target on average, 0% accuracy means a perfect solution).

Implementation

I chose to complete this project in Python both because the majority of my portfolio is in a web-focused language (JavaScript, PHP, etc.) and so I wanted to highlight my ability in other environments, and because Python is high level enough to focus on the conceptual parts of the project.

The teaching graph is implemented via the classes Teacher and Student in infection.py. These classes include an infect method that toggles their infection state and (by default) immediately spreads the infection through the classroom. Although strictly for the purposes of this project, a smaller data structure (an array of boolean values representing students, for example) would easily suffice, I chose to use fully-fledged classes to mirror the real world where classrooms and students are associated with a lot more information.

Algorithm

The Limited Infection scenario is an example of the famous Subset Sum problem, where the goal is to choose a subset of a given set of numbers that sum to a certain target. If the goal is to infect k students and we have classrooms of sizes a_1, a_2, a_3, ..., a_n then we attempt to find a subset of these classrooms with size summing to k. The Subset Sum problem is known to be NP-complete, so a perfect polynomial time algorithm is beyond the scope of this project. Thus, we can either compromise on complexity (and implement a brute force, exponential time algorithm) and/or correctness (and implement an algorithm that finds a subset of classrooms with only approximately k total students). As a start, the bruteforce function in limited_infection.py provides a brute-force but correct algorithm.

However, even though this problem is very difficult in the general case, the fact that we know something about the sizes of classrooms allows for powerful hueristics. For most practical purposes, we will have a large number of classrooms of relatively small size (a large, distributed user base) and if we would like to infect 10% of the userbase, getting within a margin of one or two classroom sizes is very acceptable. In addition, we probably want the infected classrooms to be chosen randomly so that users are equally likely to try out new features (imagine if we had many classrooms of size one, and always infected those for every new feature). The close_enough function in limited_infection.py implements this kind of an algorithm.

About

An implementation of the Infection challenge for Khan Academy's application process


Languages

Language:Python 100.0%