Add Tensor#toSeq and Tensor#toArray methods

Question

Add Tensor#toSeq and Tensor#toArray methods

Atry opened this issue 7 years ago · comments

We need Tensor#toSeq and Tensor#toArray methods for creating n-dimensional scala.collection.Seq or scala.Array, as the reverse conversion of Tensor.apply.

Yang, Bo · Answer 1 · Thu Mar 29 2018 14:53:37 GMT+0800 (China Standard Time)

This method can be implemented from existing Tensor#flatArray and Tensor#shape

evanhaldane · Answer 2 · Fri Apr 27 2018 06:33:12 GMT+0800 (China Standard Time)

Hey! Taking a look at this as a first issue.

For the toArray method, we could do something like:

def f(flatArray:Array[A], shape:Array[Int]):Array[B]= {
    // if desired shape is 1d, we're done
    if (shape.length == 1){
        flatArray
    } else {
        // desired shape must match number of elements
        if (shape.product != flatArray.length){
            throw new IllegalArgumentException
        }
        // pick off last dimension, partitioning into slices
        val oneReduced = (0 until shape.product by shape(shape.length-1)).map {
            i => flatArray.slice(i, i + shape(shape.length-1))
        };
        f(oneReduced.toArray, shape.slice(0, shape.length-1))
    }
}

(with the appropriate types A&B worked out)

But in https://github.com/ThoughtWorksInc/Compute.scala/blob/0.4.x/Tensors/src/main/scala/com/thoughtworks/compute/Tensors.scala#L1105
the output of flatArray is a Future. So do we want to ensure that result is computed before being passed into this helper function, or should this function also deal with and return a Future?

Yang, Bo · Answer 3 · Fri Apr 27 2018 09:22:21 GMT+0800 (China Standard Time)

Welcome!
I think it should be a Future, since it is a slow action. For now, all slow actions are Future or Do, except toString, because toString is an overridden method.

Yang, Bo · Answer 4 · Fri Apr 27 2018 09:23:29 GMT+0800 (China Standard Time)

But there are other considerations.

Since toArray or toSeq in Scala collection library is not asynchronous, the name toArray will surprise people if it returns a Future
What is the type of B? How to check the type?

evanhaldane · Answer 5 · Fri Apr 27 2018 09:47:25 GMT+0800 (China Standard Time)

I see. One option is two different methods.

It would return an Array of Floats or possibly an Array of Arrays (of either Arrays or Floats). So we could define a custom type or just use Either.

Yang, Bo · Answer 6 · Fri Apr 27 2018 09:59:43 GMT+0800 (China Standard Time)

Given there are too many possible dimensions, it is hard to be represented in Either.

def readScalar: Future[Float]
def read1DArray: Future[Array[Float]]
def read2DArray: Future[Array[Array[Float]]]
def read3DArray: Future[Array[Array[Array[Float]]]]

evanhaldane · Answer 7 · Fri Apr 27 2018 10:13:55 GMT+0800 (China Standard Time)

We probably want the ability to work with n-dimensional Tensors / Arrays, right?

Yang, Bo · Answer 8 · Fri Apr 27 2018 10:20:53 GMT+0800 (China Standard Time)

That's the purpose of this issue

evanhaldane · Answer 9 · Fri Apr 27 2018 10:23:21 GMT+0800 (China Standard Time)

Yeah - I was trying to say that we can't explicitly give read2DArray,read3DArray, etc. since we want the ability to work with any number of dimensions.

Yang, Bo · Answer 10 · Fri Apr 27 2018 10:32:49 GMT+0800 (China Standard Time)

If we want to avoid read2DArray, read3DArray, then a type class for arbitrary dimensions is required.

def read[Out](implicit tensorReader: TensorReader[Out]): Future[Out]

// Usage

tensor1.read[Float]
tensor2.read[Seq[Array[Float]]
tensor3.read[Vector[List[Array[Float]]]

evanhaldane · Answer 11 · Fri Apr 27 2018 11:04:46 GMT+0800 (China Standard Time)

Okay I'll give that a try!

Yang, Bo · Answer 12 · Fri Apr 27 2018 11:06:43 GMT+0800 (China Standard Time)

Another option is returning an Any. It is not type safe on dimensions but it can be understood since Tensor is not type safe on dimensions as well.

evanhaldane · Answer 13 · Fri Apr 27 2018 11:08:35 GMT+0800 (China Standard Time)

Yeah I was originally thinking something like Array[Either[Float, Array[A]]] forSome {type A}

Yang, Bo · Answer 14 · Fri Apr 27 2018 11:13:51 GMT+0800 (China Standard Time)

Either has to be recursive

def read: T forSome { type T <: Either[Float, Array[T]] }

Yang, Bo · Answer 15 · Fri Apr 27 2018 11:16:56 GMT+0800 (China Standard Time)

However it is very inefficient to create an Array[Left[Float]].

evanhaldane · Answer 16 · Sat Apr 28 2018 05:51:03 GMT+0800 (China Standard Time)

So if the user is calling toArray, they probably want a result of type Array[Array[...Array[Float]...], right? So any use of Either type or custom class doesn't really solve the problem, right? And if we return Any, that's also no good:

def f : Any = {Array(2,3)}
var y = f;
y(0)
//error: scala.this.Any does not take parameters

and we have a similar problem if we use Array[Any].

I see why they only defined Array.ofDim for up to 5 arguments.

relevant: https://stackoverflow.com/questions/30623062/6-or-more-dimensional-arrays-in-scala

Yang, Bo · Answer 17 · Sat Apr 28 2018 09:13:04 GMT+0800 (China Standard Time)

They will need an asInstanceOf if the method returns Future[Any]. It's not type safe. However, other signature is not type safe, too. The type class approach also contains a runtime shape check, which is performed implicitly. Implicit type check v.s. explicit type check, which is better? 2018-04-28 5:51 GMT+08:00 evanhaldane <notifications@github.com>:

…

So if the user is calling toArray, they probably want a result of type Array[Array[...Array[Float]...], right? So any use of Either type or custom class doesn't really solve the problem, right? And if we return Any, that's also no good: def f : Any = {Array(2,3)}var y = f; y(0)//error: scala.this.Any does not take parameters and we have a similar problem if we use Array[Any]. I see why they only defined Array.ofDim for up to 5 arguments. relevant: https://stackoverflow.com/questions/30623062/6-or-more- dimensional-arrays-in-scala — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#130 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAktunsOC6I6CIrbeuIVH_Wwsvf_yoeUks5ts5LIgaJpZM4S_1Uy> .

-- 杨博 (Yang Bo)

evanhaldane · Answer 18 · Tue May 01 2018 08:06:22 GMT+0800 (China Standard Time)

In this:

def reshapeArray(a: Any, b:Array[Int]): Any = {
  if (b.length == 1) {
    a.asInstanceOf[Array[Any]]
  } else {
    val last = b(b.length - 1)
    val oneReduced = Array.tabulate(last)(i => a.asInstanceOf[Array[Any]].slice(i*last, (i+1)*last))
    reshapeArray(oneReduced, b.slice(0, b.length-1))
  }
}

def toArray: Future[Any] = {
  flatArray.flatMap((z) => Future {reshapeArray(z,shape)})
}

I'm running into problems with

[error]  found   : Any
[error]  required: com.thoughtworks.tryt.covariant.TryT[com.thoughtworks.continuation.UnitContinuation,Any]

Is this related to your version of Future instead of scala.concurrent.Future?

Yang, Bo · Answer 19 · Tue May 01 2018 08:38:11 GMT+0800 (China Standard Time)

flatArray.map { z => reshapeArray(z, shape) }

Try map.

evanhaldane · Answer 20 · Tue May 01 2018 09:17:21 GMT+0800 (China Standard Time)

Thanks - that indeed does compile. Now to write tests (and actually have them pass) 😄

Yang, Bo · Answer 21 · Tue May 01 2018 09:30:31 GMT+0800 (China Standard Time)

Hint: you can use grouped.toArray / grouped.toSeq instead of tabulate and slice.

evanhaldane · Answer 22 · Wed May 02 2018 06:01:56 GMT+0800 (China Standard Time)

Noted about grouped.

In writing some tests I ran into some runtime problems regarding types. I've fixed that, but again I'm running into the same problem, i.e. flatArray returns a thoughtworks Future, but I can't seem to either pass that as an argument to reshapeArray, nor can I write a callback on such a thing. map and flatMap don't seem to work.

I'm getting:

found   : com.thoughtworks.future.Future[Array[_]]
[error]     (which expands to)  com.thoughtworks.future.opacityTypes.Future[Array[_]]
[error]  required: Array[_]

map works with scala.concurrent.Future. I can't see any relevant examples in the documentation.

Yang, Bo · Answer 23 · Wed May 02 2018 08:09:00 GMT+0800 (China Standard Time)

According the this Scaladoc, you need imports to make map and flatMap available

evanhaldane · Answer 24 · Wed May 02 2018 08:23:42 GMT+0800 (China Standard Time)

That is already imported in Tensors.scala.

Yang, Bo · Answer 25 · Wed May 02 2018 08:32:16 GMT+0800 (China Standard Time)

The error message looks like you are calling a function that accepts an Array while you provides a Future[Array[_]]. Try asking the question on StackOverflow with a minimal reproducible example.

Yang, Bo · Answer 26 · Wed May 02 2018 08:34:10 GMT+0800 (China Standard Time)

Working with Future or other monadic data type is difficult, because you have to use nested higher ordered functions everywhere.

You can use Each or Dsl.scala to ease the problem.