dirkschumacher / armacmp

🚀 Automatically compile linear algebra R code to C++ with Armadillo

Home Page:https://dirkschumacher.github.io/armacmp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[error] argument "val" is missing, with no default

dselivanov opened this issue · comments

As an less trivial exercise I wanted to write a simple low-rank matrix decomposition X = A * B where A and B has small rank k. And I minimize sum((X - A*B)**2).

Here is a function to do that:

solve_mf = function(
    X, A, B, 
    k = type_scalar_integer(), 
    learning_rate = type_scalar_numeric()) {
  U = nrow(X)
  J = ncol(X)
  
  for (u in seq_len(U)) {
    for (j in seq_len(J)) {
      A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, j] - sum(A[u, ] * B[, j]))
      B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, j] - sum(A[u, ] * B[, j]))
    }
  }

  # ideally should return 
  # return(list(A, B))

  # for the sake of simplicity let's return just dummy value
  return(0.0)
}

When I try to compile I'm getting:

 Error in self$get_tail_elements()[[3L]]$compile() : 
  argument "val" is missing, with no default 

and long traceback...

Thanks. Will take a look

A[u, ] this does not work at the moment. Also #51

ah, sorry I missed that from function reference article, I thought full-feature indexing is supported

But very useful, however a bit more complicated to implement depending on the features of armadillo.

depending on the features of armadillo

could you elaborate?

Does armadillo support updating whole matrix columns/rows? If yes, then it should not be too much work.

Does armadillo support updating whole matrix columns/rows? If yes, then it should not be too much work.

It does. E.g. A.col(1) = randu<mat>(5,1);

Yes, it does. See examples here. Now I'm going to check whether it allows to update both columns and rows (something like X[c(1, 3), c(2, 5)] = 1)

[according to docs])(http://arma.sourceforge.net/docs.html#submat) it should support these cases...

In a first version we could have subviews defined by a single scalar. E.g. A[5, ] but not A[1:3, ].

Thats would be great! powerful enough for many cases.

But the difference between arma code for A[5, ] and A[1:3, ] will be just row vs rows, won't it?

Seems works fine.

library(Rcpp)
sourceCpp(code = 
"
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::export]]
void armatest(arma::Mat<double> &X) {
  uvec i = {0, 2};
  uvec j = {1, 3};
  X(i, j) += 1;
}
"
)
m = matrix(0, 4, 4)
armatest(m)
m
>      [,1] [,2] [,3] [,4]
>[1,]    0    1    0    1
>[2,]    0    0    0    0
>[3,]    0    1    0    1
>[4,]    0    0    0    0

.rows accepts two integers it seems A.rows(p, q). While A[1:3, ] could be supported, x <- 1:3; A[x, ] would not work, since I do not know (currently) that x is a vector and when to use .row or .rows.

Seems there are 2 signatures - for contiguous and non-contiguous views. While contiguous are for sure faster, non-contiguous would be easier to implement.

contiguous:

X.cols( first_col, last_col )
X.rows( first_row, last_row )

non-contiguous views:

X.cols( vector_of_column_indices )
X.rows( vector_of_row_indices )

And same when both row and column indices are used

X.submat( vector_of_row_indices, vector_of_column_indices )
X( vector_of_row_indices, vector_of_column_indices )

Ah nice, so maybe we can always use rows/cols. Side effect would be that one could use logical vectors for subsetting as well.

Didn't have much time, but at least it compiles :)

library(armacmp)
solve_mf <- function(
  X, A, B, 
  k = type_scalar_integer(), 
  learning_rate = type_scalar_numeric()) {
  U = nrow(X)
  J = ncol(X)
  
  for (u in seq_len(U)) {
    for (j in seq_len(J)) {
      A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, j] - sum(A[u, ] * B[, j]))
      B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, j] - sum(A[u, ] * B[, j]))
    }
  }
  
  return(list(A, B))
}
fun <- compile(solve_mf, verbose = TRUE)
#> R function
#> 
#> function (X, A, B, k = type_scalar_integer(), learning_rate = type_scalar_numeric()) 
#> {
#>     U = nrow(X)
#>     J = ncol(X)
#>     for (u in seq_len(U)) {
#>         for (j in seq_len(J)) {
#>             A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, 
#>                 j] - sum(A[u, ] * B[, j]))
#>             B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, 
#>                 j] - sum(A[u, ] * B[, j]))
#>         }
#>     }
#>     return(list(A, B))
#> }
#> 
#> C++ function translation
#> 
#> Rcpp::List armacmp_fun(const arma::mat& X, arma::mat A, arma::mat B, int k, double learning_rate)
#> {
#> auto U = X.n_rows;
#> auto J = X.n_cols;
#> for (const auto& u : arma::linspace<arma::colvec>(1, U, U))
#> {
#> for (const auto& j : arma::linspace<arma::colvec>(1, J, J))
#> {
#> A.row(u - 1) = A.row(u - 1) + learning_rate * B.col(j - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) * B.col(j - 1)));
#> B.col(j - 1) = B.col(j - 1) + learning_rate * A.row(u - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) * B.col(j - 1)));
#> }
#> 
#> 
#> }
#> 
#> 
#> return Rcpp::List::create(A, B);
#> }

Created on 2019-12-26 by the reprex package (v0.3.0)

A[u, ] * B[, j] gets translated to A.row(u - 1) * B.col(j - 1)) but it should be A.row(u - 1) % B.col(j - 1))

Ok, fixed now:

library(armacmp)
solve_mf <- function(
  X, A, B, 
  k = type_scalar_integer(), 
  learning_rate = type_scalar_numeric()) {
  U = nrow(X)
  J = ncol(X)
  
  for (u in seq_len(U)) {
    for (j in seq_len(J)) {
      A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, j] - sum(A[u, ] * B[, j]))
      B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, j] - sum(A[u, ] * B[, j]))
    }
  }
  
  return(list(A, B))
}
fun <- compile(solve_mf, verbose = TRUE)
#> R function
#> 
#> function (X, A, B, k = type_scalar_integer(), learning_rate = type_scalar_numeric()) 
#> {
#>     U = nrow(X)
#>     J = ncol(X)
#>     for (u in seq_len(U)) {
#>         for (j in seq_len(J)) {
#>             A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, 
#>                 j] - sum(A[u, ] * B[, j]))
#>             B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, 
#>                 j] - sum(A[u, ] * B[, j]))
#>         }
#>     }
#>     return(list(A, B))
#> }
#> 
#> C++ function translation
#> 
#> Rcpp::List armacmp_fun(const arma::mat& X, arma::mat A, arma::mat B, int k, double learning_rate)
#> {
#> auto U = X.n_rows;
#> auto J = X.n_cols;
#> for (const auto& u : arma::linspace<arma::colvec>(1, U, U))
#> {
#> for (const auto& j : arma::linspace<arma::colvec>(1, J, J))
#> {
#> A.row(u - 1) = A.row(u - 1) + learning_rate * B.col(j - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % B.col(j - 1)));
#> B.col(j - 1) = B.col(j - 1) + learning_rate * A.row(u - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % B.col(j - 1)));
#> }
#> 
#> 
#> }
#> 
#> 
#> return Rcpp::List::create(A, B);
#> }

Created on 2019-12-26 by the reprex package (v0.3.0)

I had to transpose the some vectors as you can only add/element-wise multiply rows with rows or cols with cols.

library(armacmp)
solve_mf <- function(
  X, A, B, 
  k = type_scalar_integer(), 
  learning_rate = type_scalar_numeric()) {
  U = nrow(X)
  J = ncol(X)
  
  for (u in seq_len(U)) {
    for (j in seq_len(J)) {
      A[u, ] = A[u, ] + learning_rate * t(B[, j]) * (X[u, j] - sum(A[u, ] * t(B[, j])))
      B[, j] = B[, j] + learning_rate * t(A[u, ]) * (X[u, j] - sum(A[u, ] * t(B[, j])))
    }
  }
  
  return(list(A, B))
}
fun <- compile(solve_mf, verbose = TRUE)
#> R function
#> 
#> function (X, A, B, k = type_scalar_integer(), learning_rate = type_scalar_numeric()) 
#> {
#>     U = nrow(X)
#>     J = ncol(X)
#>     for (u in seq_len(U)) {
#>         for (j in seq_len(J)) {
#>             A[u, ] = A[u, ] + learning_rate * t(B[, j]) * (X[u, 
#>                 j] - sum(A[u, ] * t(B[, j])))
#>             B[, j] = B[, j] + learning_rate * t(A[u, ]) * (X[u, 
#>                 j] - sum(A[u, ] * t(B[, j])))
#>         }
#>     }
#>     return(list(A, B))
#> }
#> 
#> C++ function translation
#> 
#> Rcpp::List armacmp_fun(const arma::mat& X, arma::mat A, arma::mat B, int k, double learning_rate)
#> {
#> auto U = X.n_rows;
#> auto J = X.n_cols;
#> for (const auto& u : arma::linspace<arma::colvec>(1, U, U))
#> {
#> for (const auto& j : arma::linspace<arma::colvec>(1, J, J))
#> {
#> A.row(u - 1) = A.row(u - 1) + learning_rate * arma::trans(B.col(j - 1)) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % arma::trans(B.col(j - 1))));
#> B.col(j - 1) = B.col(j - 1) + learning_rate * arma::trans(A.row(u - 1)) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % arma::trans(B.col(j - 1))));
#> }
#> 
#> 
#> }
#> 
#> 
#> return Rcpp::List::create(A, B);
#> }

Created on 2019-12-27 by the reprex package (v0.3.0)

There are several ways I would write internal updates.

Using dot product A[u, ] %*% B[, j] instead of sum(A[u, ] * t(B[, j]). I've put sum here in example because dot-product would return a 1 x 1 matrix instead of scalar (in arma) and I felt it will be more tricky to implement. But this actually might work fine - X[i, j] - A[u, ] %*% B[, j] will give 1 x 1 matrix (let's call it Y). And then B[, j] %*% Y will give k * 1 matrix (or column vector). But if I would write an R program I would use here just *, not %*% and this will not be properly translated to arma code.

As a solution I believe it makes sense to always treat A[i, ] as A[i, , drop = FALSE] and A[, j] as A[, j, drop = FALSE]. This is more strict than base R, but less ambitious - you can always reason about the type of such subview - rowvec and colvec respectively. Then when you do dot product of colvec by rowvec you can simplify it to a scalar (and use arma::as_scalar)...

As a solution I believe it makes sense to always treat A[i, ] as A[i, , drop = FALSE] and A[, j] as A[, j, drop = FALSE]. This is more strict than base R, but less ambitious - you can always reason about the type of such subview - rowvec and colvec respectively.

👍