[error] argument "val" is missing, with no default

Question

[error] argument "val" is missing, with no default

dselivanov opened this issue 4 years ago · comments

As an less trivial exercise I wanted to write a simple low-rank matrix decomposition X = A * B where A and B has small rank k. And I minimize sum((X - A*B)**2).

Here is a function to do that:

solve_mf = function(
    X, A, B, 
    k = type_scalar_integer(), 
    learning_rate = type_scalar_numeric()) {
  U = nrow(X)
  J = ncol(X)
  
  for (u in seq_len(U)) {
    for (j in seq_len(J)) {
      A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, j] - sum(A[u, ] * B[, j]))
      B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, j] - sum(A[u, ] * B[, j]))
    }
  }

  # ideally should return 
  # return(list(A, B))

  # for the sake of simplicity let's return just dummy value
  return(0.0)
}

When I try to compile I'm getting:

 Error in self$get_tail_elements()[[3L]]$compile() : 
  argument "val" is missing, with no default

and long traceback...

Dirk Schumacher · Answer 1 · Thu Dec 26 2019 22:53:53 GMT+0800 (China Standard Time)

Thanks. Will take a look

Dirk Schumacher · Answer 2 · Thu Dec 26 2019 22:54:37 GMT+0800 (China Standard Time)

A[u, ] this does not work at the moment. Also #51

Dmitry Selivanov · Answer 3 · Thu Dec 26 2019 23:18:03 GMT+0800 (China Standard Time)

ah, sorry I missed that from function reference article, I thought full-feature indexing is supported

Dirk Schumacher · Answer 4 · Thu Dec 26 2019 23:31:24 GMT+0800 (China Standard Time)

But very useful, however a bit more complicated to implement depending on the features of armadillo.

Dmitry Selivanov · Answer 5 · Thu Dec 26 2019 23:32:15 GMT+0800 (China Standard Time)

depending on the features of armadillo

could you elaborate?

Dirk Schumacher · Answer 6 · Thu Dec 26 2019 23:35:39 GMT+0800 (China Standard Time)

Does armadillo support updating whole matrix columns/rows? If yes, then it should not be too much work.

Dirk Schumacher · Answer 7 · Thu Dec 26 2019 23:38:32 GMT+0800 (China Standard Time)

Does armadillo support updating whole matrix columns/rows? If yes, then it should not be too much work.

It does. E.g. A.col(1) = randu<mat>(5,1);

Dmitry Selivanov · Answer 8 · Thu Dec 26 2019 23:41:45 GMT+0800 (China Standard Time)

Yes, it does. See examples here. Now I'm going to check whether it allows to update both columns and rows (something like X[c(1, 3), c(2, 5)] = 1)

Dmitry Selivanov · Answer 9 · Thu Dec 26 2019 23:43:21 GMT+0800 (China Standard Time)

[according to docs])(http://arma.sourceforge.net/docs.html#submat) it should support these cases...

Dirk Schumacher · Answer 10 · Thu Dec 26 2019 23:45:00 GMT+0800 (China Standard Time)

In a first version we could have subviews defined by a single scalar. E.g. A[5, ] but not A[1:3, ].

Dmitry Selivanov · Answer 11 · Thu Dec 26 2019 23:50:32 GMT+0800 (China Standard Time)

Thats would be great! powerful enough for many cases.

Dmitry Selivanov · Answer 12 · Thu Dec 26 2019 23:52:57 GMT+0800 (China Standard Time)

But the difference between arma code for A[5, ] and A[1:3, ] will be just row vs rows, won't it?

Dmitry Selivanov · Answer 13 · Thu Dec 26 2019 23:57:22 GMT+0800 (China Standard Time)

Seems works fine.

library(Rcpp)
sourceCpp(code = 
"
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::export]]
void armatest(arma::Mat<double> &X) {
  uvec i = {0, 2};
  uvec j = {1, 3};
  X(i, j) += 1;
}
"
)
m = matrix(0, 4, 4)
armatest(m)
m

>      [,1] [,2] [,3] [,4]
>[1,]    0    1    0    1
>[2,]    0    0    0    0
>[3,]    0    1    0    1
>[4,]    0    0    0    0

Dirk Schumacher · Answer 14 · Thu Dec 26 2019 23:58:16 GMT+0800 (China Standard Time)

.rows accepts two integers it seems A.rows(p, q). While A[1:3, ] could be supported, x <- 1:3; A[x, ] would not work, since I do not know (currently) that x is a vector and when to use .row or .rows.

Dmitry Selivanov · Answer 15 · Fri Dec 27 2019 00:02:25 GMT+0800 (China Standard Time)

Seems there are 2 signatures - for contiguous and non-contiguous views. While contiguous are for sure faster, non-contiguous would be easier to implement.

contiguous:

X.cols( first_col, last_col )
X.rows( first_row, last_row )

non-contiguous views:

X.cols( vector_of_column_indices )
X.rows( vector_of_row_indices )

Dmitry Selivanov · Answer 16 · Fri Dec 27 2019 00:04:49 GMT+0800 (China Standard Time)

And same when both row and column indices are used

X.submat( vector_of_row_indices, vector_of_column_indices )
X( vector_of_row_indices, vector_of_column_indices )

Dirk Schumacher · Answer 17 · Fri Dec 27 2019 00:47:37 GMT+0800 (China Standard Time)

Ah nice, so maybe we can always use rows/cols. Side effect would be that one could use logical vectors for subsetting as well.

Dirk Schumacher · Answer 18 · Fri Dec 27 2019 04:27:47 GMT+0800 (China Standard Time)

Didn't have much time, but at least it compiles :)

library(armacmp)
solve_mf <- function(
  X, A, B, 
  k = type_scalar_integer(), 
  learning_rate = type_scalar_numeric()) {
  U = nrow(X)
  J = ncol(X)
  
  for (u in seq_len(U)) {
    for (j in seq_len(J)) {
      A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, j] - sum(A[u, ] * B[, j]))
      B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, j] - sum(A[u, ] * B[, j]))
    }
  }
  
  return(list(A, B))
}
fun <- compile(solve_mf, verbose = TRUE)
#> R function
#> 
#> function (X, A, B, k = type_scalar_integer(), learning_rate = type_scalar_numeric()) 
#> {
#>     U = nrow(X)
#>     J = ncol(X)
#>     for (u in seq_len(U)) {
#>         for (j in seq_len(J)) {
#>             A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, 
#>                 j] - sum(A[u, ] * B[, j]))
#>             B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, 
#>                 j] - sum(A[u, ] * B[, j]))
#>         }
#>     }
#>     return(list(A, B))
#> }
#> 
#> C++ function translation
#> 
#> Rcpp::List armacmp_fun(const arma::mat& X, arma::mat A, arma::mat B, int k, double learning_rate)
#> {
#> auto U = X.n_rows;
#> auto J = X.n_cols;
#> for (const auto& u : arma::linspace<arma::colvec>(1, U, U))
#> {
#> for (const auto& j : arma::linspace<arma::colvec>(1, J, J))
#> {
#> A.row(u - 1) = A.row(u - 1) + learning_rate * B.col(j - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) * B.col(j - 1)));
#> B.col(j - 1) = B.col(j - 1) + learning_rate * A.row(u - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) * B.col(j - 1)));
#> }
#> 
#> 
#> }
#> 
#> 
#> return Rcpp::List::create(A, B);
#> }

^{Created on 2019-12-26 by the reprex package (v0.3.0)}

Dirk Schumacher · Answer 19 · Fri Dec 27 2019 04:29:33 GMT+0800 (China Standard Time)

A[u, ] * B[, j] gets translated to A.row(u - 1) * B.col(j - 1)) but it should be A.row(u - 1) % B.col(j - 1))

Dirk Schumacher · Answer 20 · Fri Dec 27 2019 04:36:44 GMT+0800 (China Standard Time)

Ok, fixed now:

library(armacmp)
solve_mf <- function(
  X, A, B, 
  k = type_scalar_integer(), 
  learning_rate = type_scalar_numeric()) {
  U = nrow(X)
  J = ncol(X)
  
  for (u in seq_len(U)) {
    for (j in seq_len(J)) {
      A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, j] - sum(A[u, ] * B[, j]))
      B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, j] - sum(A[u, ] * B[, j]))
    }
  }
  
  return(list(A, B))
}
fun <- compile(solve_mf, verbose = TRUE)
#> R function
#> 
#> function (X, A, B, k = type_scalar_integer(), learning_rate = type_scalar_numeric()) 
#> {
#>     U = nrow(X)
#>     J = ncol(X)
#>     for (u in seq_len(U)) {
#>         for (j in seq_len(J)) {
#>             A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, 
#>                 j] - sum(A[u, ] * B[, j]))
#>             B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, 
#>                 j] - sum(A[u, ] * B[, j]))
#>         }
#>     }
#>     return(list(A, B))
#> }
#> 
#> C++ function translation
#> 
#> Rcpp::List armacmp_fun(const arma::mat& X, arma::mat A, arma::mat B, int k, double learning_rate)
#> {
#> auto U = X.n_rows;
#> auto J = X.n_cols;
#> for (const auto& u : arma::linspace<arma::colvec>(1, U, U))
#> {
#> for (const auto& j : arma::linspace<arma::colvec>(1, J, J))
#> {
#> A.row(u - 1) = A.row(u - 1) + learning_rate * B.col(j - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % B.col(j - 1)));
#> B.col(j - 1) = B.col(j - 1) + learning_rate * A.row(u - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % B.col(j - 1)));
#> }
#> 
#> 
#> }
#> 
#> 
#> return Rcpp::List::create(A, B);
#> }

^{Created on 2019-12-26 by the reprex package (v0.3.0)}

Dirk Schumacher · Answer 21 · Fri Dec 27 2019 18:17:49 GMT+0800 (China Standard Time)

I had to transpose the some vectors as you can only add/element-wise multiply rows with rows or cols with cols.

library(armacmp)
solve_mf <- function(
  X, A, B, 
  k = type_scalar_integer(), 
  learning_rate = type_scalar_numeric()) {
  U = nrow(X)
  J = ncol(X)
  
  for (u in seq_len(U)) {
    for (j in seq_len(J)) {
      A[u, ] = A[u, ] + learning_rate * t(B[, j]) * (X[u, j] - sum(A[u, ] * t(B[, j])))
      B[, j] = B[, j] + learning_rate * t(A[u, ]) * (X[u, j] - sum(A[u, ] * t(B[, j])))
    }
  }
  
  return(list(A, B))
}
fun <- compile(solve_mf, verbose = TRUE)
#> R function
#> 
#> function (X, A, B, k = type_scalar_integer(), learning_rate = type_scalar_numeric()) 
#> {
#>     U = nrow(X)
#>     J = ncol(X)
#>     for (u in seq_len(U)) {
#>         for (j in seq_len(J)) {
#>             A[u, ] = A[u, ] + learning_rate * t(B[, j]) * (X[u, 
#>                 j] - sum(A[u, ] * t(B[, j])))
#>             B[, j] = B[, j] + learning_rate * t(A[u, ]) * (X[u, 
#>                 j] - sum(A[u, ] * t(B[, j])))
#>         }
#>     }
#>     return(list(A, B))
#> }
#> 
#> C++ function translation
#> 
#> Rcpp::List armacmp_fun(const arma::mat& X, arma::mat A, arma::mat B, int k, double learning_rate)
#> {
#> auto U = X.n_rows;
#> auto J = X.n_cols;
#> for (const auto& u : arma::linspace<arma::colvec>(1, U, U))
#> {
#> for (const auto& j : arma::linspace<arma::colvec>(1, J, J))
#> {
#> A.row(u - 1) = A.row(u - 1) + learning_rate * arma::trans(B.col(j - 1)) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % arma::trans(B.col(j - 1))));
#> B.col(j - 1) = B.col(j - 1) + learning_rate * arma::trans(A.row(u - 1)) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % arma::trans(B.col(j - 1))));
#> }
#> 
#> 
#> }
#> 
#> 
#> return Rcpp::List::create(A, B);
#> }

^{Created on 2019-12-27 by the reprex package (v0.3.0)}

Dmitry Selivanov · Answer 22 · Fri Dec 27 2019 19:00:28 GMT+0800 (China Standard Time)

There are several ways I would write internal updates.

Using dot product A[u, ] %*% B[, j] instead of sum(A[u, ] * t(B[, j]). I've put sum here in example because dot-product would return a 1 x 1 matrix instead of scalar (in arma) and I felt it will be more tricky to implement. But this actually might work fine - X[i, j] - A[u, ] %*% B[, j] will give 1 x 1 matrix (let's call it Y). And then B[, j] %*% Y will give k * 1 matrix (or column vector). But if I would write an R program I would use here just *, not %*% and this will not be properly translated to arma code.

As a solution I believe it makes sense to always treat A[i, ] as A[i, , drop = FALSE] and A[, j] as A[, j, drop = FALSE]. This is more strict than base R, but less ambitious - you can always reason about the type of such subview - rowvec and colvec respectively. Then when you do dot product of colvec by rowvec you can simplify it to a scalar (and use arma::as_scalar)...

Dirk Schumacher · Answer 23 · Fri Dec 27 2019 22:36:59 GMT+0800 (China Standard Time)

As a solution I believe it makes sense to always treat A[i, ] as A[i, , drop = FALSE] and A[, j] as A[, j, drop = FALSE]. This is more strict than base R, but less ambitious - you can always reason about the type of such subview - rowvec and colvec respectively.

👍