[error] argument "val" is missing, with no default
dselivanov opened this issue · comments
As an less trivial exercise I wanted to write a simple low-rank matrix decomposition X = A * B where A and B has small rank k. And I minimize sum((X - A*B)**2)
.
Here is a function to do that:
solve_mf = function(
X, A, B,
k = type_scalar_integer(),
learning_rate = type_scalar_numeric()) {
U = nrow(X)
J = ncol(X)
for (u in seq_len(U)) {
for (j in seq_len(J)) {
A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, j] - sum(A[u, ] * B[, j]))
B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, j] - sum(A[u, ] * B[, j]))
}
}
# ideally should return
# return(list(A, B))
# for the sake of simplicity let's return just dummy value
return(0.0)
}
When I try to compile I'm getting:
Error in self$get_tail_elements()[[3L]]$compile() :
argument "val" is missing, with no default
and long traceback...
Thanks. Will take a look
A[u, ]
this does not work at the moment. Also #51
ah, sorry I missed that from function reference article, I thought full-feature indexing is supported
But very useful, however a bit more complicated to implement depending on the features of armadillo.
depending on the features of armadillo
could you elaborate?
Does armadillo support updating whole matrix columns/rows? If yes, then it should not be too much work.
Does armadillo support updating whole matrix columns/rows? If yes, then it should not be too much work.
It does. E.g. A.col(1) = randu<mat>(5,1);
Yes, it does. See examples here. Now I'm going to check whether it allows to update both columns and rows (something like X[c(1, 3), c(2, 5)] = 1
)
[according to docs])(http://arma.sourceforge.net/docs.html#submat) it should support these cases...
In a first version we could have subviews defined by a single scalar. E.g. A[5, ]
but not A[1:3, ]
.
Thats would be great! powerful enough for many cases.
But the difference between arma code for A[5, ]
and A[1:3, ]
will be just row
vs rows
, won't it?
Seems works fine.
library(Rcpp)
sourceCpp(code =
"
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::export]]
void armatest(arma::Mat<double> &X) {
uvec i = {0, 2};
uvec j = {1, 3};
X(i, j) += 1;
}
"
)
m = matrix(0, 4, 4)
armatest(m)
m
> [,1] [,2] [,3] [,4]
>[1,] 0 1 0 1
>[2,] 0 0 0 0
>[3,] 0 1 0 1
>[4,] 0 0 0 0
.rows
accepts two integers it seems A.rows(p, q)
. While A[1:3, ]
could be supported, x <- 1:3; A[x, ]
would not work, since I do not know (currently) that x
is a vector and when to use .row
or .rows
.
Seems there are 2 signatures - for contiguous and non-contiguous views. While contiguous are for sure faster, non-contiguous would be easier to implement.
contiguous:
X.cols( first_col, last_col )
X.rows( first_row, last_row )
non-contiguous views:
X.cols( vector_of_column_indices )
X.rows( vector_of_row_indices )
And same when both row and column indices are used
X.submat( vector_of_row_indices, vector_of_column_indices )
X( vector_of_row_indices, vector_of_column_indices )
Ah nice, so maybe we can always use rows/cols
. Side effect would be that one could use logical vectors for subsetting as well.
Didn't have much time, but at least it compiles :)
library(armacmp)
solve_mf <- function(
X, A, B,
k = type_scalar_integer(),
learning_rate = type_scalar_numeric()) {
U = nrow(X)
J = ncol(X)
for (u in seq_len(U)) {
for (j in seq_len(J)) {
A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, j] - sum(A[u, ] * B[, j]))
B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, j] - sum(A[u, ] * B[, j]))
}
}
return(list(A, B))
}
fun <- compile(solve_mf, verbose = TRUE)
#> R function
#>
#> function (X, A, B, k = type_scalar_integer(), learning_rate = type_scalar_numeric())
#> {
#> U = nrow(X)
#> J = ncol(X)
#> for (u in seq_len(U)) {
#> for (j in seq_len(J)) {
#> A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u,
#> j] - sum(A[u, ] * B[, j]))
#> B[, j] = B[, j] + learning_rate * A[u, ] * (X[u,
#> j] - sum(A[u, ] * B[, j]))
#> }
#> }
#> return(list(A, B))
#> }
#>
#> C++ function translation
#>
#> Rcpp::List armacmp_fun(const arma::mat& X, arma::mat A, arma::mat B, int k, double learning_rate)
#> {
#> auto U = X.n_rows;
#> auto J = X.n_cols;
#> for (const auto& u : arma::linspace<arma::colvec>(1, U, U))
#> {
#> for (const auto& j : arma::linspace<arma::colvec>(1, J, J))
#> {
#> A.row(u - 1) = A.row(u - 1) + learning_rate * B.col(j - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) * B.col(j - 1)));
#> B.col(j - 1) = B.col(j - 1) + learning_rate * A.row(u - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) * B.col(j - 1)));
#> }
#>
#>
#> }
#>
#>
#> return Rcpp::List::create(A, B);
#> }
Created on 2019-12-26 by the reprex package (v0.3.0)
A[u, ] * B[, j]
gets translated to A.row(u - 1) * B.col(j - 1))
but it should be A.row(u - 1) % B.col(j - 1))
Ok, fixed now:
library(armacmp)
solve_mf <- function(
X, A, B,
k = type_scalar_integer(),
learning_rate = type_scalar_numeric()) {
U = nrow(X)
J = ncol(X)
for (u in seq_len(U)) {
for (j in seq_len(J)) {
A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u, j] - sum(A[u, ] * B[, j]))
B[, j] = B[, j] + learning_rate * A[u, ] * (X[u, j] - sum(A[u, ] * B[, j]))
}
}
return(list(A, B))
}
fun <- compile(solve_mf, verbose = TRUE)
#> R function
#>
#> function (X, A, B, k = type_scalar_integer(), learning_rate = type_scalar_numeric())
#> {
#> U = nrow(X)
#> J = ncol(X)
#> for (u in seq_len(U)) {
#> for (j in seq_len(J)) {
#> A[u, ] = A[u, ] + learning_rate * B[, j] * (X[u,
#> j] - sum(A[u, ] * B[, j]))
#> B[, j] = B[, j] + learning_rate * A[u, ] * (X[u,
#> j] - sum(A[u, ] * B[, j]))
#> }
#> }
#> return(list(A, B))
#> }
#>
#> C++ function translation
#>
#> Rcpp::List armacmp_fun(const arma::mat& X, arma::mat A, arma::mat B, int k, double learning_rate)
#> {
#> auto U = X.n_rows;
#> auto J = X.n_cols;
#> for (const auto& u : arma::linspace<arma::colvec>(1, U, U))
#> {
#> for (const auto& j : arma::linspace<arma::colvec>(1, J, J))
#> {
#> A.row(u - 1) = A.row(u - 1) + learning_rate * B.col(j - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % B.col(j - 1)));
#> B.col(j - 1) = B.col(j - 1) + learning_rate * A.row(u - 1) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % B.col(j - 1)));
#> }
#>
#>
#> }
#>
#>
#> return Rcpp::List::create(A, B);
#> }
Created on 2019-12-26 by the reprex package (v0.3.0)
I had to transpose the some vectors as you can only add/element-wise multiply rows with rows or cols with cols.
library(armacmp)
solve_mf <- function(
X, A, B,
k = type_scalar_integer(),
learning_rate = type_scalar_numeric()) {
U = nrow(X)
J = ncol(X)
for (u in seq_len(U)) {
for (j in seq_len(J)) {
A[u, ] = A[u, ] + learning_rate * t(B[, j]) * (X[u, j] - sum(A[u, ] * t(B[, j])))
B[, j] = B[, j] + learning_rate * t(A[u, ]) * (X[u, j] - sum(A[u, ] * t(B[, j])))
}
}
return(list(A, B))
}
fun <- compile(solve_mf, verbose = TRUE)
#> R function
#>
#> function (X, A, B, k = type_scalar_integer(), learning_rate = type_scalar_numeric())
#> {
#> U = nrow(X)
#> J = ncol(X)
#> for (u in seq_len(U)) {
#> for (j in seq_len(J)) {
#> A[u, ] = A[u, ] + learning_rate * t(B[, j]) * (X[u,
#> j] - sum(A[u, ] * t(B[, j])))
#> B[, j] = B[, j] + learning_rate * t(A[u, ]) * (X[u,
#> j] - sum(A[u, ] * t(B[, j])))
#> }
#> }
#> return(list(A, B))
#> }
#>
#> C++ function translation
#>
#> Rcpp::List armacmp_fun(const arma::mat& X, arma::mat A, arma::mat B, int k, double learning_rate)
#> {
#> auto U = X.n_rows;
#> auto J = X.n_cols;
#> for (const auto& u : arma::linspace<arma::colvec>(1, U, U))
#> {
#> for (const auto& j : arma::linspace<arma::colvec>(1, J, J))
#> {
#> A.row(u - 1) = A.row(u - 1) + learning_rate * arma::trans(B.col(j - 1)) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % arma::trans(B.col(j - 1))));
#> B.col(j - 1) = B.col(j - 1) + learning_rate * arma::trans(A.row(u - 1)) * (X(u - 1, j - 1) - arma::accu(A.row(u - 1) % arma::trans(B.col(j - 1))));
#> }
#>
#>
#> }
#>
#>
#> return Rcpp::List::create(A, B);
#> }
Created on 2019-12-27 by the reprex package (v0.3.0)
There are several ways I would write internal updates.
Using dot product A[u, ] %*% B[, j]
instead of sum(A[u, ] * t(B[, j])
. I've put sum
here in example because dot-product would return a 1 x 1
matrix instead of scalar (in arma) and I felt it will be more tricky to implement. But this actually might work fine - X[i, j] - A[u, ] %*% B[, j]
will give 1 x 1
matrix (let's call it Y
). And then B[, j] %*% Y
will give k * 1
matrix (or column vector). But if I would write an R program I would use here just *
, not %*%
and this will not be properly translated to arma code.
As a solution I believe it makes sense to always treat A[i, ]
as A[i, , drop = FALSE]
and A[, j]
as A[, j, drop = FALSE]
. This is more strict than base R, but less ambitious - you can always reason about the type of such subview - rowvec
and colvec
respectively. Then when you do dot product of colvec
by rowvec
you can simplify it to a scalar (and use arma::as_scalar
)...
As a solution I believe it makes sense to always treat
A[i, ]
asA[i, , drop = FALSE]
andA[, j]
asA[, j, drop = FALSE]
. This is more strict than base R, but less ambitious - you can always reason about the type of such subview -rowvec
andcolvec
respectively.
👍