array filter?

Question

array filter?

brentp opened this issue 6 years ago · comments

cool project, I was looking at the tests and saw this:

let aArray = [2, 8, -4]

  test "array filter":
    check((aArray --> filter(it > 2)) == [8, 0, 0])

I would expect [8]

Michael Schulte · Answer 1 · Tue Feb 20 2018 04:43:43 GMT+0800 (China Standard Time)

Yes, I think the real use-case for array-filter is not yet clear. Maybe it should be prevented. Especially since the array size is only known at compile-time and the filter operation is performed at run-time, the output can not be shrinked.

The array was introduced to perform in-place operations, which makes sense for map and the other operations which return an integral result.

Alexander Ivanov · Answer 2 · Tue Feb 20 2018 06:42:35 GMT+0800 (China Standard Time)

Yes, I am going to remove filter for now, as it can't be followed by other operations correctly.

I've thought of two possible solutions:

aArray --> filterCount(it > 2) == ([8, 0, 0], 1)

or

aArray -> filter(it > 2) == arraySlice([8, 0, 0], 1) # [8]

where

type
  ArraySlice*[T] = object
    arr*: ptr T
    first*: uint
    last*: uint

or similar. Array slices make more sense if one wants to apply several operations in place including filter. For new arrays filterCount is the most reasonable options

Michael Schulte · Answer 3 · Tue Feb 20 2018 14:11:59 GMT+0800 (China Standard Time)

I really don't know - thing is, that up to now the macro code itself was - at least kind of - straight forward. I'm not sure anyone will need the array filter - at least not as a result. Clearly when the operation is finished with a single type chaining with all, fold, etc. we don't have a problem here. But I can't think of a language that would support an array.filter in that way.

Just my opinion: but how about when filter is used and the final result is an iterable type, the final type should just be a seq!
.. on the other hand the result ([8, 0, 0], 1) could be called with toSeq and we have a sequence...

But: that behavior gets kind of erratic - I'd prefer an error message

'filtered array is not supported as a result type - use a seq instead'

Maybe the user could use the foreach instead and apply the changes on the array him/herself.

Alexander Ivanov · Answer 4 · Tue Feb 20 2018 17:24:06 GMT+0800 (China Standard Time)

I think returning a seq would be surprising.

I plan on removing filter for array and adding filterToSeq only for array, so people don't get a seq by mistake

Mamy Ratsimbazafy · Answer 5 · Tue Feb 20 2018 17:26:34 GMT+0800 (China Standard Time)

I agree, filter should just return a seq doing something else would had more cognitive overload "Oh what was zero_functional doing for filter for arrays again, have to read the docs".

The only language we can compare to is Rust, others like Ocaml, Haskell, Python ... uses Linked List so size is not known at run time.

Rust answer is simple: there is no FromIterator traits that allows collection of iterator chaining into an array: https://stackoverflow.com/questions/26757355/how-do-i-collect-into-an-array

To allow for less disjointed interface, maybe zip, map, filter etc should always return a seq, and we can have a ForEach macro that takes a varargs of arrays?

Alexander Ivanov · Answer 6 · Tue Feb 20 2018 17:44:01 GMT+0800 (China Standard Time)

I still think supporting operations resulting in arrays (or even other types of collections) is useful.

I've missed dictionary and set comprehensions from Python, similar idioms are also important in Ruby.

Alexander Ivanov · Answer 7 · Tue Feb 20 2018 17:45:23 GMT+0800 (China Standard Time)

We can also expand the notation

aArray => map(..) =>@ filter(..) # map returns an array, filter returns a seq
aTable => map(..) =>@ filter(..) => all(..) # map returns table, after that we work with a seq

Alexander Ivanov · Answer 8 · Tue Feb 20 2018 17:47:15 GMT+0800 (China Standard Time)

This way => will result in <inputType> = <outputType> and =>@ in <outputType> = seq[args of input]

Mamy Ratsimbazafy · Answer 9 · Tue Feb 20 2018 17:53:16 GMT+0800 (China Standard Time)

Nice use of nim flexibility!

Alexander Ivanov · Answer 10 · Tue Feb 20 2018 17:58:28 GMT+0800 (China Standard Time)

(It can be also combined with defaulting on @michael72 's => instead of -->: --> can be deprecated but still left in for backwards compat).

I'll implement that in the evening if it seems suitable

Michael Schulte · Answer 11 · Tue Feb 20 2018 20:44:14 GMT+0800 (China Standard Time)

differentiate between =>@ and => is actually a great idea! - I hope it won't be a hell to implement ;-)

Michael Schulte · Answer 12 · Wed Feb 21 2018 03:47:09 GMT+0800 (China Standard Time)

Darn - I just saw, that => is already part of the future module ... (forgot about that, although I've used it already)
so....

-->@ (looks a bit like a rose ;) <--@)
What about
==> and =>@ =>= and =>@
or maybe
|=> and |=>@ hm....
|-> and |->@
-- could be named as "pipeInto" or "pipeIntoSeq"

hm... |> already exists in some other languages meaning: use the result of the left side and apply it to the right side...

myList --> map(it + 1)         myList -->@ filter(it > 0)
myList ->= map(it + 1)         myList ->@  filter(it > 0)
myList ==> map(it + 1)         myList =>@  filter(it > 0)
myList =>= map(it + 1)         myList =>@  filter(it > 0)
myList |=> map(it + 1)         myList |=>@ filter(it > 0)
myList |-> map(it + 1)         myList |->@ filter(it > 0)

I think I'm fine with anything :)

Alexander Ivanov · Answer 13 · Wed Feb 21 2018 05:16:05 GMT+0800 (China Standard Time)

Great analysis, => is clashing with future indeed. I like the ->= and ->@ idea, it makes sense, but it seems pretty hard to remember: 3 different sigils, so continuing with the current notation might be best.

--> and -->@ ?

Michael Schulte · Answer 14 · Wed Feb 21 2018 12:42:23 GMT+0800 (China Standard Time)

3 (or 4) different sigils might be too much maybe - yeah

OK then

myList --> map(it + 1)
myArray -->@ filter(it > 0)

alternatively maybe

myList ==> map(it + 1)
myArray ==>@ filter(it > 0)

looks alright ;-)
yeah - stick with the `-->´ then :+1

Alexander Ivanov · Answer 15 · Thu Feb 22 2018 04:29:01 GMT+0800 (China Standard Time)

Now I remember: the dsl notation is based on a single operator. That makes it easy to analyze the whole chain as . has higher precedence

base --> a.b.c

also

base --> zipWith(other).map(f)

# -->
base <optimize those operations on it> a.b

Now, a --> b --> c should be equivalent, but slower as you can benchmark. Maybe there is a way to make it work like this but I am not sure how without an additional element

zero:
  a --> b --> c

So, my new idea is to

a --> map().mapIndexed().filter@() # <handler>@ returns always a seq, <handler> returns InputType for map, filter

Alexander Ivanov · Answer 16 · Thu Feb 22 2018 04:30:08 GMT+0800 (China Standard Time)

If you can think of a way to make the other syntax work (because now it's (a --> b) --> c: it's expanded into two loops with new seq-s), go ahead !

Michael Schulte · Answer 17 · Thu Feb 22 2018 13:13:28 GMT+0800 (China Standard Time)

Why not the second solution - before turning it all upside down.

You could also call it filterSeq btw. Then working on seq.filterSeq has no real benefit - but it would return a seq in case an array is supplied and it is the last argument (or the last before map).
Should be easy to implement (as you can easily get the type of an element - see my find implementation)
Maybe we could use a branch for developing new features btw - maybe even one branch for each feature 😉

Alexander Ivanov · Answer 18 · Thu Feb 22 2018 16:19:34 GMT+0800 (China Standard Time)

Yeah, filter@ is basically a little bit shorter syntax for filterSeq. Maybe <handler>Seq is a bit more reasonable, I'll think about it.

That's true, I'll work in a branch

Michael Schulte · Answer 19 · Fri Feb 23 2018 22:14:19 GMT+0800 (China Standard Time)

@brentp
Back to the original issue - I think we can close this now.

The test code - in case of array now would look like:

let aArray = [2, 8, -4]
test "array filter":
    check((aArray --> filter(it > 2)) == [0, 8, 0])
test "array filterSeq":
    check((aArray --> filterSeq(it > 2)) == @[8])

This is "nulling" of all entries in the matrix where the given condition is false in the case with the array. With filterSeq the size of the resulting sequence is reduced.
So the benefit is still questionable in case of filter on array, but it could potentially be used to 0 vector entries or matrix entries (depending on what the array shall represent).

It is not possible to produce a smaller array as output at runtime.

Brent Pedersen · Answer 20 · Fri Feb 23 2018 23:01:39 GMT+0800 (China Standard Time)

cheers. I'll git it a try.