Optimization for ManagedLinearAlgebraProvider MatrixMultiply

Question

Optimization for ManagedLinearAlgebraProvider MatrixMultiply

Muchaszewski opened this issue 2 years ago · comments

I'm trying to make PseudoInverse work with no allocation and most of my code is not usable in this repo without a major refactoring, but I propose a potential improvement for method MatrixMultiply(double[] x, int rowsX, int columnsX, double[] y, int rowsY, int columnsY, double[] result)

Note that some implementation details were skipped for code readability
Current implementation

Array.Clear(result, 0, result.Length);

// Extract column arrays
var columnDataB = new double[columnsY][];
for (int i = 0; i < columnDataB.Length; i++)
{
    var column = new double[rowsY];
    GetColumn(Transpose.DontTranspose, i, rowsY, columnsY, y, column);
    columnDataB[i] = column;
}
                
var row = new double[columnsX];
for (int i = 0; i < rowsX; i++)
{
    GetRow(Transpose.DontTranspose, i, rowsX, columnsX, x, row);
    for (int j = 0; j < columnsY; j++)
    {
        var col = columnDataB[j];
        double sum = 0;
        for (int ii = 0; ii < row.Length; ii++)
        {
            sum += row[ii] * col[ii];
        }

        result[j * rowsX + i] += 1.0 * sum;
    }
}

Problem: This method allocates around 80kb per multiplication on 2 * 100 double. And forces GC event from time to time.

Proposed solution

 Array.Clear(result, 0, result.Length);

for (var i = 0; i < rowsX; i++)
{
    for (var j = 0; j < columnsY; j++)
    {
        double sum = 0;
        for (var ii = 0; ii < columnsX; ii++)
        {
            sum += GetAt(i, ii, rowsX, x) * GetAt(ii, j, rowsY, y);
        }
        result[j * rowsX + i] += 1.0 * sum;
    }
}

static T GetAt<T>(int rowIndex, int colIndex, int numRows, T[] matrix)
{
    return matrix[(colIndex * numRows) + rowIndex];
}

This is more of a proof of work and a suggestion, instead of proper implementation. I have an issue with my code and it might be with this part. But I think it would be worth investigating to provide alloc free implementation for some methods.

This code allocates 0b per multiplication on 2 * 100 double. And there are no GC events.