Feature Request: Non-casting binops

Question

Feature Request: Non-casting binops

thomcom opened this issue 2 years ago · comments

Presently, all column types that get a .mul(2) command are casted to Float64. This is not completely desirable. It is preferable that a type be retained unless it must be upcast, such as for overflow or int.mul(float). I'm pretty sure this is all supported by libcudf.

rapids@tcomer-NVIDIA:~/node/modules/demo/api-server$ node
Welcome to Node.js v18.2.0.
Type ".help" for more information.
> const {Series, Int32, Int64, Float32, Float64} = require('@rapidsai/cudf')
undefined
> let a = Series.new([0, 1]).cast(new Int32)
undefined
> let b = Series.new([0, 1]).cast(new Int64)
undefined
> a.mul(2).type
Float64 [Float] { precision: 2 }
> b.mul(2).type
Float64 [Float] { precision: 2 }

Paul Taylor · Answer 1 · Wed Jun 29 2022 01:13:03 GMT+0800 (China Standard Time)

All numbers in JS are double-precision floats (or 64-bit integers if using the n literal suffix). It isn't possible (without using runtime numeric analysis) that the number the user provided isn't a double, so we always have to up-cast in the case a number is passed. The workaround to get the non-casting behavior would be to construct a wrapping Scalar and pass it instead:

$ node
> const {Series, Int32, addon: { Scalar }} = require('@rapidsai/cudf')
undefined
> let a = Series.new([0, 1]).cast(new Int32)
undefined
> a.mul(new Scalar({value: 2, type: new Int32})).type
Int32 [Int] { isSigned: true, bitWidth: 32 }

H. Thomson Comer · Answer 2 · Thu Jun 30 2022 00:18:33 GMT+0800 (China Standard Time)

That's pretty heavy-handed just for wanting your column to retain its dtype. Maybe add an argument like .add(2, elevateType=false, mr) ?

Paul Taylor · Answer 3 · Thu Jun 30 2022 09:15:16 GMT+0800 (China Standard Time)

I can think of a few things:

Add convenience functions for constructing scalars:
```
a.mul(Scalar.int32(2))
```
Add a signature that accepts a template literal string arg that infers the numeric type via pattern matching:
```
a.mul(`2`).type // Int32
a.mul(`12.32`).type // Float64
```

H. Thomson Comer · Answer 4 · Tue Jul 12 2022 05:57:58 GMT+0800 (China Standard Time)

#1 just doesn't seem Javascript-y to me at all. As a JS developer I know I'd always assume that the type of the scalar argument is supposed to be cast to the type of the column, what about always constructing a scalar when we call a.mul?

#2 works well for simple arithmetic like this but gets really expensive if we're trying to do a programmatic mul of some kind, though we'd hope that a scatter and a column x column mul would be used instead of an iteration. In fact this way it compels the developer doing more than a few muls to write more efficient code, so I think I could be on board for it.

Paul Taylor · Answer 5 · Wed Jul 13 2022 04:46:37 GMT+0800 (China Standard Time)

We always create a Scalar from the input, but the issue is figuring out the Scalar's dtype. If you pass a JS number, we must create a Scalar({type: new Float64}), because JS numbers are always doubles.

If we wanted to create another dtype, we'd have to do some kind of runtime numeric analysis on the double to see if it fits into
a smaller (or different) numeric type.

H. Thomson Comer · Answer 6 · Wed Jul 13 2022 21:25:05 GMT+0800 (China Standard Time)

Ok, I should have phrased #1 better to say, "The Scalar matches the type of the input column, instead of the type of the JS number, unless otherwise specified." is in some way my preference. Auto-casting every operation to float64, and then having to cast it back down to another type, every time we do an operation, unless we consider to create a cudf.Scalar explicitly every time really seems like poor usability to me.

#1 beats the current situation, though! I'll work on a convenience method for Scalar construction in a bit.

Paul Taylor · Answer 7 · Thu Jul 14 2022 02:34:48 GMT+0800 (China Standard Time)

@thomcom We can't do that because it'd be incorrect. If you do Series.new({data: [1, 2, 3], type: new Int8}).mul(12345.6789), the resulting values neither match the dtype "int", nor fit into a 1-byte signed integer. We have to compare the input LHS and RHS dtypes and find a common dtype between them in order to ensure correctness. Float64 is often the only common dtype in the degenerative case.

H. Thomson Comer · Answer 8 · Fri Jul 15 2022 05:33:46 GMT+0800 (China Standard Time)

Like you said, in the Javascript case, Float64 isn't just the degenerative case, it is every case unless other steps are taken. At this time it seems to me like the default behavior is acceptable, but if the user creates a typed scalar like Series.new({data: [1, 2, 3], type: new Int8}).mul(new Scalar({value: 100, type: new Int8}) then the result should be [100, -55, 45].

Do we already have this functionality, but users need to know to create a Scalar as the argument to mul?

Paul Taylor · Answer 9 · Fri Jul 15 2022 07:24:52 GMT+0800 (China Standard Time)

Yes all the binary ops already accept scalars as inputs.

H. Thomson Comer · Answer 10 · Fri Jul 15 2022 23:15:25 GMT+0800 (China Standard Time)

I guess I'll close this? :)