mlpack / ensmallen

A header-only C++ library for numerical optimization --

Home Page:http://ensmallen.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Replacing if clause with matrix operation

jonpsy opened this issue · comments

I'm currently working on SBX crossover operator for SPEA-II. You can have a look at how the real algorithm works in python or C and check my implementation here.

As you would've noticed, there are way too many if blocks in the code, this could be solved using matrix prod like this:

arma::uvec idxCondition = (a > b); 
arma::vec vector = originalVec % idxCondition

Below I'll write very simplified pseudo code for the original code in C for SBX. As per the original code, we take gene(variable) from each parent, in case both the gene are equal we put the gene of either of the parent.

Established way:

SBX(parentA, parentB, childA, childB)
{
  childA = parentA; childB = parentB;
   ......
   loop(idx in numGenes)
   {
    if( arma:: randu() < crossProb) //[0]Should I crossover?
      // [1] Skip if gene equal
       if(parentA[idx] == parentB[idx])
          continue; 
       dominant = std::max(childA[idx], childB[idx]);
       recessive = std::min(childA[idx], childB[idx]);
       // Handled 0 division case using [1]
       double beta = 1 + 2 * (recessive - lowerBound[idx])/ (dominant - recessive);
       ....
    else // no crossover in this gene: [0]
       continue
}

Proposed way:

SBX(parentA, parentB, childA, childB)
{
  childA = parentA; childB = parentB;
  arma::uvec idxEqual = (parentA == parentB); //Index for "is the gene equal"?
  arma::uvec idxCrossover  = arma::randu(numGenes, 1) < 0.5; //Index for "should i crossover?"

  arma::vec dominant = arma::max(parentA, parentB);
  arma::vec recessive = arma::min(parentA, parentB);
 //results in nan where gene is eq
  arma::vec beta = 1 + 2 * (recessive - lowerBound)/(dominant - recessive);
  .........
  // Replace with parents gene where both parents are equal (nan handled).
 childA = childA % (1 - idxEqual) + parentA % idxEqual;
 // Replace with parents where crossover wasn't desirable
 childA = childA % (idxCrossover) + parentA % (1 - idxCrossOver)
....

}

The question I pose is, is this OK? Does this put too much stress on the user? The line of code in the proposed is very less and there are no if blocks, but the final replacement childA = childA % (1 - idxEqual) ... maybe annoying? Let me know your thoughts.

It's possible that without the possibility for loop termination this could be much slower. (It is also possible that it could be faster, depends on the data and the complexity of the operations being done.) So that is probably worth checking.

In any case I think the code is readable either way. I would suggest using the technique that performs better.

Agreed, I would put together a simple benchmark, use armadillo tic/toc or something else and get some numbers; difficult to say at this point if one or the other method is faster, which I think is the main factor here, because I think both are easy to read, I might slight lean towards the if/else case but if both are the same (timing wise), I would say go with whatever you prefer.

I've benchmarked using tic/toc and here are the results.
Note: Input population was generated uniformly
Ratio (vectorised vs normal)

a) numVariables : 5 Ratio: 1.0637184351694362

Timing for normal: 1.626e-05
Timing for vectorised: 1.5286e-05

b) numVariables : 100 Ratio : 1.5724

Timing for normal: 4.9826e-05
Timing for vectorised: 7.835e-05

c) numVariables: 1000 Ratio: 2.3295676804585086

Timing for normal: 0.000271489
Timing for vectorised: 0.000632452

d) numVariables: 1000000(very large number) Ratio: 2.5319350480896023

Timing for normal: 0.185799
Timing for vectorised: 0.470431

Vectorised seems to be increasingly underperforming compared to the usual method. I think the reason is, as @rcurtin mentioned the skip of the loop. As number of variables increases higher chances of a variable being equal.

So, I guess I'll stick to the usual then :) , I think it was important to explore this idea and benchmark it to analyse its feasibility.