red-data-tools / red-arrow-numo-narray

A library that provides conversion method between Apache Arrow and Numo::NArray

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add ChunkedArray#to_narray

mrkn opened this issue · comments

Like Arrow::Array#to_narray, we can add ChunkedArray#to_narray.

I think we need support two features:

  1. Just like chunked_array.chunks.map(&:to_narray)
  2. Convert a chunked array into a single array

The following patch will work.

diff --git a/lib/arrow-numo-narray/to-narray.rb b/lib/arrow-numo-narray/to-narray.rb
index 04d753f..ad339d3 100644
--- a/lib/arrow-numo-narray/to-narray.rb
+++ b/lib/arrow-numo-narray/to-narray.rb
@@ -85,6 +85,22 @@ module Arrow
     end
   end
 
+  class ChunkedArray
+    def to_narray
+      unless n_nulls.zero?
+        message = "can't convert #{self.class} that has null values to NArray"
+        raise ArrowNumoNArray::UnconvertibleError, message
+      end
+      narray = value_data_type.narray_class.new(length)
+      data = ""
+      chunks.each do |chunk|
+        data << chunk.buffer.data.to_s
+      end
+      narray.store_binary(data)
+      narray
+    end
+  end
+
   class Tensor
     def to_narray
       narray = value_data_type.narray_class.new(shape)

Implemented.

Ah, should we return [numo_narray1, numo_narray2, ...] for ChunedArray#to_narray?

I want to both two features, described the above comment, in this library.
I think the method to_narray should generate a single narray.
We need to consider what appropriate name is for the remaining case.

It seems that chunked_array.chunks.map(&:to_narray) is enough for the feature.
What is the use case of this? Is it happened many times?

chunked_array.chunks.map(&:to_narray) isn't important for now. But it will be useful on a large chunked array if we let Arrow::Array#to_narray share its buffer pointer with a NArray array.