In this exercise you will optimize a set of parameters in a mock information retrieval algorithm.
pytest -- tests.py
python solvemate_ml_exercise.py
To keep it simple, I've applied the same logic to our problem than the f-score: the fitness function is a weighted harmonic mean.
See fitness_func()
in solvemate_ml_exercise.py
See main()
in solvemate_ml_exercise.py
There are plenty of ways of tuning parameters. Commonly used are:
- grid search: we need to discretize the 3 parameters and then the complexity will be O(n3) so not efficient.
- randomized search: we can set the number of iterations but then it will be an approximation of the optimum.
- Stochastic Gradient Descent: we are not in a linear case. Some tricks are possible to avoid local maximum but it will anyway converge slower.
That's why I decided to come with a genetic algorithm approach.
How would you adjust your algorithm to the case that the variables were weights? For example, if a, b and c are models or algorithms which we wish to weight within a pipeline?
In this case the problem is different. We could think the problem linear and the loss of a linear function is convex, so we can use Stochastic Gradient Descent in an efficient way.
How does the complexity of the problem increase in higher dimensions i.e. with more parameters?
The problem to optimize a nonlinear function with continuous parameter is NP-hard. Adding parameters to the problems keeps an NP-hard problem.