Replacing if clause with matrix operation
jonpsy opened this issue · comments
I'm currently working on SBX crossover operator for SPEA-II. You can have a look at how the real algorithm works in python or C and check my implementation here.
As you would've noticed, there are way too many if
blocks in the code, this could be solved using matrix prod like this:
arma::uvec idxCondition = (a > b);
arma::vec vector = originalVec % idxCondition
Below I'll write very simplified pseudo code for the original code in C for SBX. As per the original code, we take gene(variable) from each parent, in case both the gene are equal we put the gene of either of the parent.
Established way:
SBX(parentA, parentB, childA, childB)
{
childA = parentA; childB = parentB;
......
loop(idx in numGenes)
{
if( arma:: randu() < crossProb) //[0]Should I crossover?
// [1] Skip if gene equal
if(parentA[idx] == parentB[idx])
continue;
dominant = std::max(childA[idx], childB[idx]);
recessive = std::min(childA[idx], childB[idx]);
// Handled 0 division case using [1]
double beta = 1 + 2 * (recessive - lowerBound[idx])/ (dominant - recessive);
....
else // no crossover in this gene: [0]
continue
}
Proposed way:
SBX(parentA, parentB, childA, childB)
{
childA = parentA; childB = parentB;
arma::uvec idxEqual = (parentA == parentB); //Index for "is the gene equal"?
arma::uvec idxCrossover = arma::randu(numGenes, 1) < 0.5; //Index for "should i crossover?"
arma::vec dominant = arma::max(parentA, parentB);
arma::vec recessive = arma::min(parentA, parentB);
//results in nan where gene is eq
arma::vec beta = 1 + 2 * (recessive - lowerBound)/(dominant - recessive);
.........
// Replace with parents gene where both parents are equal (nan handled).
childA = childA % (1 - idxEqual) + parentA % idxEqual;
// Replace with parents where crossover wasn't desirable
childA = childA % (idxCrossover) + parentA % (1 - idxCrossOver)
....
}
The question I pose is, is this OK? Does this put too much stress on the user? The line of code in the proposed is very less and there are no if
blocks, but the final replacement childA = childA % (1 - idxEqual) ...
maybe annoying? Let me know your thoughts.
It's possible that without the possibility for loop termination this could be much slower. (It is also possible that it could be faster, depends on the data and the complexity of the operations being done.) So that is probably worth checking.
In any case I think the code is readable either way. I would suggest using the technique that performs better.
Agreed, I would put together a simple benchmark, use armadillo tic/toc
or something else and get some numbers; difficult to say at this point if one or the other method is faster, which I think is the main factor here, because I think both are easy to read, I might slight lean towards the if/else
case but if both are the same (timing wise), I would say go with whatever you prefer.
I've benchmarked using tic/toc
and here are the results.
Note: Input population was generated uniformly
Ratio (vectorised vs normal)
a) numVariables : 5 Ratio: 1.0637184351694362
Timing for normal: 1.626e-05
Timing for vectorised: 1.5286e-05
b) numVariables : 100 Ratio : 1.5724
Timing for normal: 4.9826e-05
Timing for vectorised: 7.835e-05
c) numVariables: 1000 Ratio: 2.3295676804585086
Timing for normal: 0.000271489
Timing for vectorised: 0.000632452
d) numVariables: 1000000(very large number) Ratio: 2.5319350480896023
Timing for normal: 0.185799
Timing for vectorised: 0.470431
Vectorised seems to be increasingly underperforming compared to the usual method. I think the reason is, as @rcurtin mentioned the skip of the loop. As number of variables increases higher chances of a variable being equal.
So, I guess I'll stick to the usual then :) , I think it was important to explore this idea and benchmark it to analyse its feasibility.