patrickgeorge1/Matrix-multiplier

Nume: Vitoga George Patrick
Grupa/Serie: 332CA

Neopt:
    
    I tackled the "unoptimised" version of algorithm using some 
    additional matrices to store some of the partial results like
    A * B, A * A' and A * B * B'.

    The multiplication is using the basic "3-for loop way", but I took 
    advantage that A is a upper triangular matrix, so I selected 
    only the elements positioned after k in the third for-loop.

    When I encountered an opperation with a transposed matrix as one 
    of the opperands, I saved some computational effort by
    replacing the "transpose" opperation with "iterating throungh columns".

    The final matrix is made of sum of the partial matrices m_ABBt
    and m_AtA.


Opt:

    The "optimised" version is an upgrade of "neopt". The update
    consists of using pointers to iterate through matrices and also
    using "register" for loop related constant variables.
    
    Also, in the last opperation, I removed a for-loop due to the 
    ability to iterate using pointers, so I iterated up to N * N.

    Using all this tricks, I managed to beat the "Unopt" version.


Blas:

    This version is the fastest among them all.
    The "Basic Linear Algebra Subprograms" is amazing, because is
    capable of doing the basic opperations in no time.

    I used only the cblas_dtrmm() and cblas_dgemm().
    With cblas_dtrmm(), I was able to multiply a triangular matrix 
    with another matrix and with cblas_dgemm() I was able to multiply
    any 2 matrices. Moreover, I even summed 2 matrices using cblas_dgemm()
    intern capabilities. 


Performances:

    
    An analysis on the input (N = 400) leads us to the following results:

|------------------------------------------------------------------------------|
|                                 NEOPT                OPT               BLAS  |
|------------------------------------------------------------------------------|
|Running time:                    1.3291              0.2922            0.0512 |
|                                                                              |
|Instruction fetches:                                                          |
|  - I1                           1358                 1365              1457  |
|  - LLi                          1325                 1327              1382  |
|                                                                              |
|Data access:                                                                  |
|  - D1                          76 mil.              52 mil.            46000 |
|  - LLd                         100000                93000             40000 |
|                                                                              |
|Branch prediction:                                                            |
|  - Miss rate                    0.3%                 0.3%               0.9% |
|------------------------------------------------------------------------------|

Considering the input size and results, it s clear that Blas and Opt are better
than Neopt.
Results become even clearer when input size is increasing.
patrickgeorge1 / Matrix-multiplier

About

Languages