Support for Stochastic Gradient Descent and Minibatches
tokahuke opened this issue · comments
I see your code is running through the whole dataset for each training iteration. For many applications, it is quicker to split them up in random smaller batches and run gradient descent on each "mini batch" (see https://en.wikipedia.org/wiki/Stochastic_gradient_descent): it may take more iterations to converge but each iteration becomes much quicker.
How high is this features in your priority list?
Hello,
Thanks for the feedback.
Actually, there is implemented classic SGD with minibatch size m = 1
. Let's look at the train
method:
pub fn train<T>(&mut self, data: &T, iterations: i64) where T: Extractable{
for _ in 0..iterations{
let (x, y) = data.rand();
self.fit(&x, &y);
}
}
As you can see it's just a wrapper for fit
method; we take one random example from training set and update NN's parameters after each epoch.
Unfortunatelly, now I'm too busy to work at the project. I think I will go on after 25, december. So, today I have the next plan for further work:
-
Create new architecture (similar as I projected here); it allows building more flexible networks;
-
Use linear algebra library crate for computations;
-
More learning algorithms (one of them is updating SGD to minibatch SGD);
-
Use GPU acceleration.
Hum... maybe I could open some PRs then in the mean time.
That would be great, but please wait me to finish first task from list.