Support for Stochastic Gradient Descent and Minibatches

Question

Support for Stochastic Gradient Descent and Minibatches

tokahuke opened this issue 7 years ago · comments

I see your code is running through the whole dataset for each training iteration. For many applications, it is quicker to split them up in random smaller batches and run gradient descent on each "mini batch" (see https://en.wikipedia.org/wiki/Stochastic_gradient_descent): it may take more iterations to converge but each iteration becomes much quicker.

How high is this features in your priority list?

Mikhail Kravets · Answer 1 · Wed Dec 06 2017 17:09:15 GMT+0800 (China Standard Time)

Hello,

Thanks for the feedback.

Actually, there is implemented classic SGD with minibatch size m = 1. Let's look at the train method:

pub fn train<T>(&mut self, data: &T, iterations: i64) where T: Extractable{
        for _ in 0..iterations{
            let (x, y) = data.rand();
            self.fit(&x, &y);
        }
    }

As you can see it's just a wrapper for fit method; we take one random example from training set and update NN's parameters after each epoch.

Unfortunatelly, now I'm too busy to work at the project. I think I will go on after 25, december. So, today I have the next plan for further work:

Create new architecture (similar as I projected here); it allows building more flexible networks;
Use linear algebra library crate for computations;
More learning algorithms (one of them is updating SGD to minibatch SGD);
Use GPU acceleration.

Pedro B. Arruda · Answer 2 · Thu Dec 07 2017 07:22:59 GMT+0800 (China Standard Time)

Hum... maybe I could open some PRs then in the mean time.

Mikhail Kravets · Answer 3 · Thu Dec 07 2017 16:32:08 GMT+0800 (China Standard Time)

That would be great, but please wait me to finish first task from list.