ThoughtWorksInc / Compute.scala

Scientific computing with N-dimensional arrays

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add Tensor#toSeq and Tensor#toArray methods

Atry opened this issue · comments

We need Tensor#toSeq and Tensor#toArray methods for creating n-dimensional scala.collection.Seq or scala.Array, as the reverse conversion of Tensor.apply.

This method can be implemented from existing Tensor#flatArray and Tensor#shape

Hey! Taking a look at this as a first issue.

For the toArray method, we could do something like:

def f(flatArray:Array[A], shape:Array[Int]):Array[B]= {
    // if desired shape is 1d, we're done
    if (shape.length == 1){
        flatArray
    } else {
        // desired shape must match number of elements
        if (shape.product != flatArray.length){
            throw new IllegalArgumentException
        }
        // pick off last dimension, partitioning into slices
        val oneReduced = (0 until shape.product by shape(shape.length-1)).map {
            i => flatArray.slice(i, i + shape(shape.length-1))
        };
        f(oneReduced.toArray, shape.slice(0, shape.length-1))
    }
}   

(with the appropriate types A&B worked out)

But in https://github.com/ThoughtWorksInc/Compute.scala/blob/0.4.x/Tensors/src/main/scala/com/thoughtworks/compute/Tensors.scala#L1105
the output of flatArray is a Future. So do we want to ensure that result is computed before being passed into this helper function, or should this function also deal with and return a Future?

Welcome!
I think it should be a Future, since it is a slow action. For now, all slow actions are Future or Do, except toString, because toString is an overridden method.

But there are other considerations.

  1. Since toArray or toSeq in Scala collection library is not asynchronous, the name toArray will surprise people if it returns a Future
  2. What is the type of B? How to check the type?

I see. One option is two different methods.

It would return an Array of Floats or possibly an Array of Arrays (of either Arrays or Floats). So we could define a custom type or just use Either.

Given there are too many possible dimensions, it is hard to be represented in Either.

def readScalar: Future[Float]
def read1DArray: Future[Array[Float]]
def read2DArray: Future[Array[Array[Float]]]
def read3DArray: Future[Array[Array[Array[Float]]]]

We probably want the ability to work with n-dimensional Tensors / Arrays, right?

That's the purpose of this issue

Yeah - I was trying to say that we can't explicitly give read2DArray,read3DArray, etc. since we want the ability to work with any number of dimensions.

If we want to avoid read2DArray, read3DArray, then a type class for arbitrary dimensions is required.

def read[Out](implicit tensorReader: TensorReader[Out]): Future[Out]
// Usage

tensor1.read[Float]
tensor2.read[Seq[Array[Float]]
tensor3.read[Vector[List[Array[Float]]]

Okay I'll give that a try!

Another option is returning an Any. It is not type safe on dimensions but it can be understood since Tensor is not type safe on dimensions as well.

Yeah I was originally thinking something like Array[Either[Float, Array[A]]] forSome {type A}

Either has to be recursive

def read: T forSome { type T <: Either[Float, Array[T]] }

However it is very inefficient to create an Array[Left[Float]].

So if the user is calling toArray, they probably want a result of type Array[Array[...Array[Float]...], right? So any use of Either type or custom class doesn't really solve the problem, right? And if we return Any, that's also no good:

def f : Any = {Array(2,3)}
var y = f;
y(0)
//error: scala.this.Any does not take parameters

and we have a similar problem if we use Array[Any].

I see why they only defined Array.ofDim for up to 5 arguments.

relevant: https://stackoverflow.com/questions/30623062/6-or-more-dimensional-arrays-in-scala

In this:

def reshapeArray(a: Any, b:Array[Int]): Any = {
  if (b.length == 1) {
    a.asInstanceOf[Array[Any]]
  } else {
    val last = b(b.length - 1)
    val oneReduced = Array.tabulate(last)(i => a.asInstanceOf[Array[Any]].slice(i*last, (i+1)*last))
    reshapeArray(oneReduced, b.slice(0, b.length-1))
  }
}

def toArray: Future[Any] = {
  flatArray.flatMap((z) => Future {reshapeArray(z,shape)})
}

I'm running into problems with

[error]  found   : Any
[error]  required: com.thoughtworks.tryt.covariant.TryT[com.thoughtworks.continuation.UnitContinuation,Any]

Is this related to your version of Future instead of scala.concurrent.Future?

flatArray.map { z => reshapeArray(z, shape) }

Try map.

Thanks - that indeed does compile. Now to write tests (and actually have them pass) 😄

Hint: you can use grouped.toArray / grouped.toSeq instead of tabulate and slice.

Noted about grouped.

In writing some tests I ran into some runtime problems regarding types. I've fixed that, but again I'm running into the same problem, i.e. flatArray returns a thoughtworks Future, but I can't seem to either pass that as an argument to reshapeArray, nor can I write a callback on such a thing. map and flatMap don't seem to work.

I'm getting:

found   : com.thoughtworks.future.Future[Array[_]]
[error]     (which expands to)  com.thoughtworks.future.opacityTypes.Future[Array[_]]
[error]  required: Array[_]

map works with scala.concurrent.Future. I can't see any relevant examples in the documentation.

According the this Scaladoc, you need imports to make map and flatMap available

That is already imported in Tensors.scala.

The error message looks like you are calling a function that accepts an Array while you provides a Future[Array[_]]. Try asking the question on StackOverflow with a minimal reproducible example.

Working with Future or other monadic data type is difficult, because you have to use nested higher ordered functions everywhere.

You can use Each or Dsl.scala to ease the problem.