Wanted: Strawman proposals for new collections architecture

Question

Wanted: Strawman proposals for new collections architecture

odersky opened this issue 9 years ago · comments

A redesign of the standard library is on the roadmap for one of the next Scala versions (could be as early as 2.13). Some requirements for the redesign should be:

Normal usage of collections should stay the same. That is, the standard operations
on collections should still be available under the same names. Some more exotic and advanced
scenarios such as breakOut can be dropped if alternative ways to achieve the same functionality
exist.
User-defined implementations of collections should port to the new library as far as is reasonable. We should allow some breakage here, if it is necessary to achieve other goals.
We should strive for simplicity in APIs and implementations.
We should strive to better separate interfaces from implementation and avoid the fragile base
class problem, where too much gets inherited automatically.
We should try to simplify the inheritance graphs, in particular those of often-used collections
such as lists.
We should improve the integration of strict and lazy collections by means of a better architecture for views. Views should avoid accidental forcing, and should omit transforms from collections that require such forcing. However, forcing is still needed to support aggregations.
We should try to integrate Java8 streams and/or have our own high-performance implementations for
parallel collections.
We should generally be at least on par with current collections in what concerns efficiency. In particular, we should still allow specializations of collection operations for particular implementations. These optimizations should still work if the static collection type is abstract. E.g. an optimized implementation of ++ on ArrayBuffers should be called even if the static types of the operands of ++ are Iterables.
The design should be friendly to optimizations such as inlining and, possibly, more advanced
whole program optimizations.

To gain experience it would be good to experiment with some prototypes that illustrate possible designs and can point to strengths and weaknesses. A full collections design is too heavy to fill this role. It takes too long to write and understand. As an alternative, I propose to implement just a subset of the required functionality. The subset should implementable in about 500 LOC or less and at the same time should be as representative as possible of the whole collections. The API should be the same for all strawman proposals.

After some experimentation, I came up with the following proposal for the API to be implemented by strawman proposals. There's still some modest room to add things if people feel it is important.

Base Traits

Iterator
Iterable
Seq

For now we leave out maps and sets (which does not say they are not important). We gloss over the mutabile/immutable distinction. All collections are in the same package (this is only for the strawman, to keep things simple, not for the full collections, of course).

Iterable and Seq are considered collection types, but Iterator is not.

Collection Implementations

List                    
ListBuffer   
ArrayBuffer          
java.lang.String  
View

List is a simplified version of Scala lists. It's an immutable collection favoring linear access patterns.
List operations should be tail-recursive, hence the addition of ListBuffer to hold intermediate results. ArrayBuffer is a mutable collection that supports random access. String should demonstrate how the architecture integrates externally defined types. String should be seen by Scala as an immutable random access sequence of Char. Finally, views should be constructible over all other collections. They are immutable and lazy.

List, ListBuffer and ArrayBuffer should have companion objects that let one construct collections given their elements in the usual way:

List(1, 2, 3)
ListBuffer("a", "bc")
ArrayBuffer()

Collection operations

The following operations should be supported by all collections.

foldLeft
foldRight
isEmpty
head
iterator
view
to

foldLeft and foldRight are the basis of all sorts of aggregations. isEmpty and head exemplify
tests and element projections. iterator and view construct iterators and views over a collection. collect is not part of the current Scala collections, but is useful to support views and Java 8 streams. It should construct a new collection of a given type from the elements of the receiver. An invocation pattern of to could be either of the following:

xs.to[List]
xs.to(List)

Strawman collections also should support the following transforms:

filter
map
flatMap
++
zip
partition
drop

map and flatMap together support all monadic operations. ++ and zip are examples of operations with multiple inputs. drop was chosen as a typical projection that yields a sub-collection. partition was chosen in addition filter because it exemplifies transforms with multiple outputs.

Strawman Seqs (but not necessarily Iterables or Views) should also support the following operations

length
apply
indexWhere
reverse

Sequences in a way are defined by their length and their indexing method apply. indexWhere is an example of a method that combines indices and sequences in an interesting way. reverse is an example of a transform that suggests an access pattern different from left-to-right.

ArrayBuffer and ListBuffer should in addition support the following mutation operations

+=
++=
trimStart

Required Optimizations

There are a number of possible specializations that the strawman architecture should support. Here are
two examples:

xs.drop(n) on a list xs should take time proportional to n, the retained part of the list should not be copied.
xs ++ ys on array buffers xs and ys should be implemented by efficient array copy operations. s1 ++ s2 on strings should use native string concatenation.
partition should require only a single collection traversal if the underlying collection is strict (i.e., not a view).

These specializations should be performed independently of the static types of their arguments.

Why No Arrays?

Collections definitely should support arrays as they do now. In particular, arrays should have the same representation as in Java and all collection operations should be applicable to them. Arrays are left out of the strawman requirements because of the bulk their implementation would add. Even though no explicit implementation is demanded, we should still check all designs for how they would support arrays.

Nils Kilden-Pedersen · Answer 1 · Mon Oct 05 2015 23:47:50 GMT+0800 (China Standard Time)

It's not clear if Sets are considered part of "collections" (since they're left out), but a number of operations are questionable for sets, e.g. map (can unexpectedly lead to reduction in resulting set) and zip (implies some order).

odersky · Answer 2 · Mon Oct 05 2015 23:54:18 GMT+0800 (China Standard Time)

@nilskp This is answered by requirement (1): normal usage of collections should stay the same. So, yes, map and zip will continue to work on sets. At least I would classify map over sets as normal usage, maybe zip is debatable. On the other hand, zip is useful. What's to object against set.zipWithIndex as a way to arbitrarily assign set elements unique indices?

soc · Answer 3 · Tue Oct 06 2015 00:06:54 GMT+0800 (China Standard Time)

Some minor bits, which might be interesting despite all the fundamental objections I have to this:

If we can't find a more sophisticated way to deal with collection size, at least move size to Long.
This can be done by telling people to either migrate to length or by encouraging them to start adding type ascriptions of : Long to code calling size now.
I have found that groupBy is a good benchmark of collection design, maybe it could be added to the list of required methods for the prototype.
What is collectAs[C] and what's the difference to to[C]? Feels like a poor name given that collect dos something completely different, a lot like mapTo in some other API.

odersky · Answer 4 · Tue Oct 06 2015 00:10:20 GMT+0800 (China Standard Time)

@soc Agree that groupBy is a good benchmark. It means we have to add maps to the base traits. I did not do it in version 1 in order to keep the scope small. But it's a prime candidate for version 2.

collectAs should indeed be to; I overlooked that we already have it. I updated the text.

odersky · Answer 5 · Tue Oct 06 2015 00:12:30 GMT+0800 (China Standard Time)

@soc The collection size type is a separate issue. Important to discuss this too, I agree.

Nils Kilden-Pedersen · Answer 6 · Tue Oct 06 2015 00:20:06 GMT+0800 (China Standard Time)

@odersky, but it should be trivial to "cast" set to a collection that supports map and zip and thus avoid any potential pitfalls, no?

odersky · Answer 7 · Tue Oct 06 2015 00:27:12 GMT+0800 (China Standard Time)

@nilskp How do you "cast"? But I think we should take this discussion elsewhere it is out of scope for the strawman discussion. Or, more productively: How about writing a strawman implementation that includes sets? Then we can see how that would work out.

Markus Kahl · Answer 8 · Tue Oct 06 2015 00:29:52 GMT+0800 (China Standard Time)

Not realizing I was mapping a Set has tripped me on a number of occasions too. Everyone else does it too, though (Haskell, Python). Haskell at least has a special comment on map for Set pointing out that the resulting set's size may be smaller.

I find zip on Sets makes sense just like toSeq where people have to realize that the order is not defined.

Edit: But you are right that this is besides the point for now. Moving on.

One point I'd like to add is that it would be nice if it was easy to create one's own abstract collection operations while retaining the type of the concrete collection. Last time I tried this it seemed relatively complicated. Though I do realise that it's not an easy issue to begin with.

soc · Answer 9 · Tue Oct 06 2015 01:21:39 GMT+0800 (China Standard Time)

@machisuji I think the issue with Set will only be a problem, if we keep trying to pretend that every intermediate operation is a Set. Given that we know all the drawbacks of that, like:

Operations are evaluated immediately, causing unnecessary and excessive computations of things which were never requested at the end, memory overhead due to allocation of intermediate data-structures and garbage collection churn.
It requires that all individual classes implement the whole API, making the API extremely heavyweight, and hard to grasp, contributing to the large foot-print of the collections library.
For some things like "Streams", providing the implementation is somewhere between "extremely hard" and "a complete nightmare".
It requires the invention of "views" as a secondary API, which try to act a little bit smarter, but are almost never used due to the syntactic overhead in usage and the lack of polish. If the new collection API needs separate "views", our design is not good enough.

Viktor Klang (√) · Answer 10 · Tue Oct 06 2015 01:31:17 GMT+0800 (China Standard Time)

I'd like to explore the idea that a collection is data, and then you apply transformations to it, and transformations can be composed separately from the data (think: transducers), macros could them be used to very efficiently inline these operations.

Nils Kilden-Pedersen · Answer 11 · Tue Oct 06 2015 02:33:38 GMT+0800 (China Standard Time)

@odersky, I meant something like toSeq/toList/etc.

What's to object against set.zipWithIndex as a way to arbitrarily assign set elements unique indices?

The fact that Set doesn't have an index? It might be nitpicking, but 1) I've never had the use case, 2) if I did, I'd prefer to use set.toSeq.zipWithIndex, which makes the intent clear.

Viktor Klang (√) · Answer 12 · Tue Oct 06 2015 02:39:39 GMT+0800 (China Standard Time)

An AnyVal Option type would be preferred for collections as well. (think Map.get etc)

Li Haoyi · Answer 13 · Tue Oct 06 2015 02:46:54 GMT+0800 (China Standard Time)

We gloss over the mutabile/immutable distinction

We probably want mutable collections to not have the same API as the immutable ones. One large problem with the current mutable collections is that by inheriting all the immutable methods, it's neigh impossible to figure out where the "useful" mutable methods are. e.g. Buffers append/remove is what you generally care about, but they're lost in the huge sea of immutable methods you basically never want. The same applies to mutable.Set, mutable.Queue, etc.

By that logic, ArrayBuffer probably shouldn't be a collection, and perhaps making it convertible to one through .view or .iterator is enough?

Views should avoid accidental forcing, and should omit transforms from collections that require such forcing. However, forcing is still needed to support aggregations.

What's the advantage of Views over using Iterators? That's what I've been using when I want to fuse map/filter/flatmap/etc. operators into constant-memory and it seems to work. You don't accidentally force because the only forcing operations are relatively obvious (reduce, sum, fold, foreach, toXXX).

List
ArrayBuffer
java.lang.String
View

Should we add Vector to that list? That's basically the current go-to immutable collection for a lot of things where you want less pointer-chasing than lists and more generic operations (prepend, insert, remove, ...) so I'd want to know where it fits in a "new world"

Viktor Klang (√) · Answer 14 · Tue Oct 06 2015 02:53:29 GMT+0800 (China Standard Time)

@lihaoyi Yes, having scala.collection.{Seq, Map, … } as a shared hierarchy between mutable and immutable is not beneficial IMO. Same goes for the deep embedding of Parallel Collections into the hierarchy. And The XOnce-things, if you need a XOnce you use an Iterator.

Dmitry Petrashko · Answer 15 · Tue Oct 06 2015 02:56:46 GMT+0800 (China Standard Time)

@lihaoyi @viktorklang the current designs that we had a look into follow your proposals.
The data is separate and operations are added to the data through implicit decorators.

The advantage is that most operations(map\flatmap\reduce\zip) do not care if underlying collection is mutable or not.

Dmitry Petrashko · Answer 16 · Tue Oct 06 2015 02:57:38 GMT+0800 (China Standard Time)

@viktorklang @lihaoyi , what's your take on deep embedding of views into collections hierarchy?

Seth Tisue · Answer 17 · Tue Oct 06 2015 03:00:16 GMT+0800 (China Standard Time)

We gloss over the mutable/immutable distinction

The way the old collections gloss over this is one of the paramount issues I would hope to see addressed in any new design.

Viktor Klang (√) · Answer 18 · Tue Oct 06 2015 03:02:06 GMT+0800 (China Standard Time)

@DarkDimius With a transducer approach I don't see the need for views at all since the primary need for views is to avoid intermediate representations.

@SethTisue 👍

Dmitry Petrashko · Answer 19 · Tue Oct 06 2015 03:06:33 GMT+0800 (China Standard Time)

@viktorklang In my understanding, Iterators should be minimal: next \ hasNext and maybe split are the only methods that I would like to have in Iterator.
Unlike this, a View could be a simple decorator over an iterator that adds other operations.

Viktor Klang (√) · Answer 20 · Tue Oct 06 2015 03:12:28 GMT+0800 (China Standard Time)

@DarkDimius With a transducer approach there's (as far as I can tell) no need for Views—you'd apply the transducer of your choice to the collection of your choice at the time you need it. Did I misunderstand you?

odersky · Answer 21 · Tue Oct 06 2015 03:16:06 GMT+0800 (China Standard Time)

It's great to have the debates, but I think for the purpose of this issue, requirement (1) is non-negotiatable. If we really change fundamentally how users interact with collections, how do we even consider migration and backwards compatibility? It would be a completely separate discussion (which we might want to have). But on this issue I'd like to keep the discussion focussed. How about I open a thread on scala-debate and everybody chimes in there for stuff outside the requirements presented here?

Viktor Klang (√) · Answer 22 · Tue Oct 06 2015 03:19:44 GMT+0800 (China Standard Time)

@odersky The way that virtually all modern platforms have done this in the past is to introduce a new library/package and include forwarders for the old library/package, is there a reason why this wouldn't be preferable in this case too? I'm assuming that there won't be enough time to arrive at a battle-proven new library unless there is a transitional period where one can fall back to the old library?

Li Haoyi · Answer 23 · Tue Oct 06 2015 03:21:03 GMT+0800 (China Standard Time)

Unlike this, a View could be a simple decorator over an iterator that adds other operations.

That sounds great!

In general, the high-level "interface" @odersky describes leaves out a lot of (what would be) crucial scaladoc. Sure, we have Iterators, Iterables and Seqs, and Views, but what are these things? What are they meant to represent, in english? What are their invariants and how are they meant to be used? Is the data-operation separation you described part of the required interface?

odersky · Answer 24 · Tue Oct 06 2015 04:28:31 GMT+0800 (China Standard Time)

Here's the thread on scala-debate:

https://groups.google.com/forum/#!topic/scala-debate/Z1YH_0Hgu5A

All discussions that fall outside requirements 1-9 should go there. I would like to reserve this issue for concrete proposals that try to address the requirements as outlined.

odersky · Answer 25 · Tue Oct 06 2015 04:34:08 GMT+0800 (China Standard Time)

Concretely, it would be great to see here:

a "current status" submission. i.e. a strawman using the current collection design. This would be extremely important to get a baseline against which any improvements can be measured.
some alternative proposals. Current collections have been criticized a lot. Now is your chance to weigh in and make a difference. But to be taken seriously we need a completely worked out proposal, nothing else should count.

odersky · Answer 26 · Tue Oct 06 2015 04:35:57 GMT+0800 (China Standard Time)

@viktorklang It won't fly. We will not be able to change something that is in every Scala tutorial in existence. Let's be realistic and stick to the requirements outlined here.

odersky · Answer 27 · Tue Oct 06 2015 04:41:44 GMT+0800 (China Standard Time)

@lihaoyi

What's the advantage of Views over using Iterators? That's what I've been using when I want to fuse map/filter/flatmap/etc. operators into constant-memory and it seems to work. You don't accidentally force because the only forcing operations are relatively obvious (reduce, sum, fold, foreach, toXXX).

Iterators are linear (in the sense of linear logic). Applying a transform like map on an iterator renders the original iterator unusable. Views are essentially reusable (non-linear) iterators.

Should we add Vector to that list? That's basically the current go-to immutable collection for a lot of things where you want less pointer-chasing than lists and more generic operations (prepend, insert, remove, ...) so I'd want to know where it fits in a "new world"

I hesitate to do this because Vectors would add a lot of code, and I want to keep this small. Immutablity is already covered by Lists and random access is covered by ArrayBuffers.

@SethTisue See scala-debate thread to mutable/immutable distinction in more detail.

drewhk · Answer 28 · Tue Oct 06 2015 19:24:11 GMT+0800 (China Standard Time)

As for transducers, Akka Streams have one of these hidden (now pending removal though): https://github.com/akka/akka/blob/release-2.3-dev/akka-stream/src/main/scala/akka/stream/impl/fusing/IteratorInterpreter.scala

It allows reuse of streaming transformations deffined for otherwise asynchronous streams over a simple Iterator. The underlying engine is very simple (most of the complications come from supporting Reactive Streams but that is not relevant for collections), although we phase it out now for one supporting graphs (which is not relevant for collections). It is bounded memory and bounded stack, not using allocations inside the engine.

Seth Tisue · Answer 29 · Tue Oct 06 2015 21:53:32 GMT+0800 (China Standard Time)

@helmbold @geggo98 please post these ideas/comments to scala-debate instead

odersky · Answer 30 · Wed Oct 07 2015 05:39:32 GMT+0800 (China Standard Time)

Here is a first strawman proposal:

https://github.com/lampepfl/dotty/blob/master/src/strawman/collections/CollectionStrawMan1.scala

Highlights:

Collection operations are all in decorators; base traits have few methods.
No CanBuildFrom needed.
Shallow class hierarchy.
Clean and suprisingly simple integration of lazy collections (aka views)
Optimizability by (1) specialized iterators and (2) ability to inspect iterators so that
combinations of iterators can be optimized.

@densh says this is actually close to the way Rust does things; I'll have to check that.

Here's a test program for strawman proposals:

https://github.com/lampepfl/dotty/blob/master/tests/run/CollectionTests.scala

odersky · Answer 31 · Wed Oct 07 2015 05:47:11 GMT+0800 (China Standard Time)

The test

https://github.com/lampepfl/dotty/blob/master/tests/run/CollectionTests.scala

mentions one issue which needs explaining. We have a problem with map on Strings. As it stands, there are two overloaded versions of map on String's decorator, with the following type signatures:

 map(f: Char => Char): String
 map[U](f: Char => U): Seq[U]

These are needed for the usual behavior of map on strings. Given a Char => Char function, it should return a String, otherwise it should return a Seq. Unfortunately, current type inference rules require then that the parameter type of the function passed to map is given explicitly.

"abc".map(x: Char => x.toUpperCase) // OK
"abc".map(x => x.toUpperCase) // Error: missing parameter type

This is annoying but I think it is fixable. In fact, both overloaded variants of map take a function from Char to something, so the type inferencer should figure out that the parameter type is Char. Right now it doesn't but I see no reason why it could not be improved to do this.

Dmitry Petrashko · Answer 32 · Wed Oct 07 2015 16:24:34 GMT+0800 (China Standard Time)

Here is my variation of Martin’s strawman
#819

It has 4 changes that are applied to Martin’s version,
and I believe they could be discussed independently:

dotty-staging@48afa75
Gets rid of view class. View is a simple type alias to () => Iterator[A].
This points the major difference between Iterators and views: iterators are consumed by
any usage, while views are reusable.

dotty-staging@5281688
Makes .view method be a decorator. This is done in order to decouple views from collection higherarhy
and make implementation of custom collections easier. It would also allow to modularise collections library
to a higher degree. I believe that the very same should be done for .par.

dotty-staging@b77ab7c
Makes Iterators have only next and hasNext methods. The motivation could be found in this code sample:

def intersectionIsEmpty[A](it1: Iterator[A], it2: Iterator[A]) =
       !(it1 exists (it2 contains _))

Which was taken from old bug in Dotty codebase dotty-staging@03ec379

I believe that this code simply should not compile: both contains and exists have no side effects on
mutable and immutable collections, but it has a silent side-effect on Iterators. It does not follow principle
of least surprise, I would actually say that this is one of most surprising methods for new-comments that
felt the joy of ”you can use the same methods on parallel, mutable and immutable collections”
when they discover that there is ”but be very careful if you use those methods on iterators”.

Ok, so what the change does: it moves all consuming methods from Iterator[A], to a decorator over View[A], that is, to () => Iterator[A]. This follows the patch that makes difference in use-cases of iterator[A] and View[A] more obvious, that I started in previous commit.

dotty-staging@5244c5c
Extracts a LengthType from all the signatures for documentation purposes & makes it be Long for big-data needs.
This change has a problem with compatibility with ScalaJs, as they simulate Longs.

Daniel Spiewak · Answer 33 · Sat Oct 10 2015 07:50:18 GMT+0800 (China Standard Time)

@odersky Taking extreme liberties with your constraint set, and heading a bit more into left-field, this is my proposal. As promised. :-) https://gist.github.com/djspiewak/2ae2570c8856037a7738

Obviously, this is more of a sketch than a full implementation, but you can see where I'm going with things. I'm confident that I could make this framework about 90-95% source compatible with what exists today outside of the weird esoteric cases (breakOut is broken out of luck). In particular, the shims approach that I'm using is quite general, and should allow us to "fill in" anything compatibility-wise that no longer fits within the core framework design.

Notably, I'm not touching on Java 8 streams, but the whole thing is simple enough that I doubt it would be an issue in any sense of the word. Efficiency of my framework should be dramatically better than the existing framework, even by orders of magnitude in a few places. It's also a lot more friendly to inlining and other compiler optimizations, due to the reduced indirection and higher specialization.

Also notably, I'm completely ignoring your proposed base traits, since base traits of any sort run completely counter to what I firmly believe a collections library should be. Hence the "out of left field" caveat. This is my proposal though.

Naftoli Gugenheim · Answer 34 · Sun Oct 11 2015 09:46:02 GMT+0800 (China Standard Time)

If we're re-doing the collections, can we make the relevant parts take an equality typeclass, like Set and indexOf?

Bill Venners · Answer 35 · Sun Oct 11 2015 11:35:28 GMT+0800 (China Standard Time)

@nafg I'm planning on including something like that in my initial strawman proposal. Just need to finish a-synchronizing ScalaTest first. For a glimpse you can look at my Scala World slides, which I finally uploaded just now:

http://www.slideshare.net/bvenners/collections-prettyslow

This is where I'll be starting, but I'll bring it in line with Martin's requirements for the strawman.

lexspoon · Answer 36 · Tue Oct 13 2015 01:57:46 GMT+0800 (China Standard Time)

It's a tough change, given how much code is using the collections at this point. The payoff can be large, though, given how ubiquitous the collections are in day-to-day Scala code.

Some initial thoughts are at the following link. Broadly, I'd lean toward: fewer collection types, no more CanBuildFrom, and move views and parallelization to side libraries. I'm not completely sold that it is important to support external collection types written by third parties. I've never had a real reason to do it.

http://blog.lexspoon.org/2015/10/initial-input-on-scala-collections.html

I wish I had time to contribute more. Good hunting, everyone involved!

EECOLOR · Answer 37 · Tue Oct 13 2015 03:35:03 GMT+0800 (China Standard Time)

I think a lot can be learned from looking at https://github.com/paulp/psp-std/blob/master/api/src/main/scala/Trait.scala

The pattern I am pointing at is that the interfaces are containing only the most essential methods. Once those methods are implemented you will get a lot of other methods for free (for example with the help of implicits). This allows developers to quickly figure out the essence of the library.

The same pattern can be seen in 'functional' constructs. If you supply map and flatten on something, you will gain methods like flatMap. I know this example comes with a lot of controversy and can possibly side-track the conversation. The point I am making is that 'stuff' that can be derived from key implementations should be available to anyone.

The writers of these types of libraries (you guys) make great effort to provide default (correct, possibly fast) implementations based on a few methods. Let me as an ordinary Scala user tap into that power:

I get the implicitly available methods for free if I implement the correct interfaces
I can change the default behavior for my goofy use case by importing another set of implicits

In the past implicits hindered discoverability, but the editors (and scala-doc) have improved enough to make the 'pimp-my-library' pattern truly useful.

It would also help with following the mantra of composition over inheritance: "If you supply me something with a size and foreach method, I will give you ..."

Qihui Sun · Answer 38 · Tue Nov 03 2015 11:34:30 GMT+0800 (China Standard Time)

I'd like agree with viktorklang's opinion to be based on Clojure 1.7 Transducers. Java 8 have some transducers, such as Comparator.comparing, java.util.stream.Collectors.mapping、reducing、groupingBy、partitioningBy、summing 、Collectors.flatMapping(in Java 9) etc.
So a Java 8 Collector is a "reducing function",it's supplier fn is arity-0, it's accumulator fn is arity-2,it's arity-1 fn is finisher or completion. And Stream.collect() is similar to Clojure into function.
see : https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/InsideTransducers.md

And would you please rename the Option to Java or Swift's Optional. Others like Function1 -> Function,Function2 ->BiFunction ..., or Could you please rename the => to -> ? This will be same as Java or Swift、JavaScript、Haskell's 👍

★开放★进取★合作★成就★

Deleted user · Answer 39 · Tue Nov 03 2015 21:58:47 GMT+0800 (China Standard Time)

Is making collections non-empty a step too far?

I like scala's type system that helps avoid null pointer exceptions at runtime by encouraging people to use "Option" and detect code problems earlier at the compile stage. Empty collections feels like the same problem as null pointer exceptions where issues can pop up at runtime ( i.e. calling "head" on the empty collection). Using non empty lists should mean more code errors are picked up at compile stage. Using non empty collections would avoid potential errors with the "head" call and remove need for "isEmpty" call.

Haskell and Scalaz also use the concept on non-empty lists.

Pathikrit Bhowmick · Answer 40 · Thu Nov 05 2015 13:28:31 GMT+0800 (China Standard Time)

If we go with @fluxgain's suggestion then this won't be an issue, IMO .max and .min etc should return Options (None for the empty collections). Right now, it does a very Java-ish thing of throwing an exception on empty collections!

Also, all the .indexOf, .indexWhere, .indexOfSlice etc returns Int (-1 indicates not found - wat!?!). It should obviously return Option[Int] - None if not found.

And, lastly, this might be an adventurous suggestion - what if we actually use negative indices for something useful e.g. we can borrow from Python and make negative indices mean "from the end"?

Li Haoyi · Answer 41 · Thu Nov 05 2015 13:30:49 GMT+0800 (China Standard Time)

And, lastly, this might be an adventurous suggestion - what if we actually use negative indices for something useful e.g. we can borrow from Python and make negative indices mean "from the end"?

Negative indexes would be utterly amazing!!!!!oneoneone

I can't remember the last time I've seen a bug due to negative indexes being possible, and I've seen a lot of bugs in Python (a language with really encourages them). As far as I can tell this would be pure win.

Qihui Sun · Answer 42 · Thu Nov 05 2015 14:53:58 GMT+0800 (China Standard Time)

I think "Modern" Collections should study also from Go lang built-in Slice，and the Eclipse Collections Framework (Goldman Sachs Collections Framework).

★开放★进取★合作★成就★

soc · Answer 43 · Thu Nov 05 2015 16:56:34 GMT+0800 (China Standard Time)

what if we actually use negative indices for something useful

I would prefer using a type which couldn't be negative instead.

Seth Tisue · Answer 44 · Thu Nov 05 2015 22:35:06 GMT+0800 (China Standard Time)

The negative indices idea went around in 2011 in https://groups.google.com/d/msg/scala-language/ZAZzyKNb7ek/q3LlIRo47iIJ, and has probably gone around other times. I don't think it has any better chance now than it did then, and in any case, it's outside the scope of this ticket.

Changing the return types of various methods from T to Option[T] is also outside the scope of this ticket.

Adding non-empty collections is outside the scope of this ticket.

Making contains and similar methods take an equality typeclass is outside the scope of this ticket.

Please use https://groups.google.com/forum/#!topic/scala-debate/Z1YH_0Hgu5A (or start new threads on scala-debate or scala-language) to make and discuss such proposals.

Thanks. It's not that we don't want these discussions to happen, but this ticket isn't a free-for-all for everything having to do with collections.

Jed Wesley-Smith · Answer 45 · Fri Nov 06 2015 08:18:54 GMT+0800 (China Standard Time)

@SethTisue your comment seems at odds with goal 3. Particularly in terms of making methods total which simplifies things significantly.

Pathikrit Bhowmick · Answer 46 · Fri Nov 06 2015 08:46:34 GMT+0800 (China Standard Time)

@SethTisue 😿

Bill Venners · Answer 47 · Fri Nov 06 2015 08:48:44 GMT+0800 (China Standard Time)

I think it may be possible to model non-emptiness in the types while still adhering to "Normal usage of collections should stay the same. That is, the standard operations on collections should still be available under the same names" by leaving the partial methods in place but with a deprecation warning. I won't be sure until I try, but I will try it once I get time to write some code.

Matthew de Detrich · Answer 48 · Fri Nov 06 2015 08:58:42 GMT+0800 (China Standard Time)

@SethTisue your comment seems at odds with goal 3. Particularly in terms of making methods total which simplifies things significantly.

I think you are misunderstanding point 3. Point 3 seems to be mainly about keeping the API simple, and not about subjective use cases of simplicity from modifying the API for reasons of totality (or other reasons).

Simple is not the same thing as correctness, or totality. They are orthogonal concepts

I think it may be possible to model non-emptiness in the types while still adhering to "Normal usage of collections should stay the same.

Not sure how this can work, as far as I am aware, you can't have methods with the exact same parameter signatures return different types (i.e. Option[T] versus the original T)

Bill Venners · Answer 49 · Fri Nov 06 2015 09:14:57 GMT+0800 (China Standard Time)

Not sure how this can work, as far as I am aware, you can have methods with the exact same
parameter signatures return different types (i.e. Option[T] versus the original T)

I think you meant you cannot have methods with the same parameter signatures but different return types. That's true you can't overload that way. You can have a differently named method though, for example we already have headOption in addition to head.

I'd rather not write English about it but Scala, but the way I modeled "inhabitedness" (to give a more positive name for non-emptiness) is to make a subtype. The subtype has a head method, the supertype does not, but if this were ever to happen in the standard library, there would be a long deprecation period in which the supertype still had head, tail, and reduce. I did a proof-of-concept for just head on Set here:

https://github.com/scalatest/scalatest/blob/feature-equasets/scalactic/src/main/scala/org/scalactic/Collections.scala#L1690

That's inside an instance that captures an Equality typeclass, which also won't work for the standard library, but the subtype technique might still work. I won't know until I try it. It may turn out to not work, and even if it does work, it may be not be appropriate for the standard library. Still I will try it and see where it leads.

One thing I haven't tried, which I'm looking forward to trying, is making a List where the Cons cell subtype has a type alias inhabited.List, and that's how it works on List. So I think for List it might actually be pretty practical even in the standard library. But for everything else, you'd end up with an extra type for everything, and that's a lot of extra surface area. Maybe it could just be List that gets it in the standard library.

Anyway, already too much English. Not enough Scala.

Matthew de Detrich · Answer 50 · Fri Nov 06 2015 09:15:43 GMT+0800 (China Standard Time)

I think you meant you cannot have methods with the same parameter signatures but different return types.

Yup, just edited the post now, my bad!

David Barri · Answer 51 · Fri Nov 06 2015 10:01:26 GMT+0800 (China Standard Time)

I think you meant you cannot have methods with the same parameter signatures but different return types.

Actually you can, using implicits and dependant types. Imagine something like

trait FailureStyle[T] {
  type Result
  def pass(t: T): Result
  def fail: Result
}
def gimme[T](implicit f: FailureStyle[T]): f.Result = ???

// and an implicit like one of these in scope
implicit def throws[T] = new FailureStyle[T] { type Result = T; …}
implicit def option[T] = new FailureStyle[T] { type Result = Option[T]; …}
implicit def returnNull[T >: Null] = new FailureStyle[T] { type Result = T; …}

Bill Venners · Answer 52 · Fri Nov 06 2015 10:54:53 GMT+0800 (China Standard Time)

Actually you can, using implicits and dependant types. Imagine something like

Good point. Jon Pretty has something similar to the failure modes in your example in Rapture. Here's another example of that sort of technique:

https://github.com/scala/scala/blob/v2.11.2/src/library/scala/collection/TraversableLike.scala#L238

The result type of map depends on which CanBuildFrom instance is chosen implicitly at the call site. So sometimes when you map a Map you get a Map, other times an Iterable. But regardless I wouldn't want to go down the path of letting people pick a "head result policy" in any collection library, especially not the standard library. It is too clever.

Jed Wesley-Smith · Answer 53 · Fri Nov 06 2015 11:31:36 GMT+0800 (China Standard Time)

@mdedetrich

Simple is not the same thing as correctness, or totality. They are orthogonal concepts

The lack of correctness or totality makes things more complicated (or less simple). This is easy to demonstrate.

Take List[A].head for example it has the type List[A] => A. Given that types are (equivalent to) propositions, we now have a proposition that is false; given a List we cannot always produce a value. We can no longer rely on the type to describe to us whether our program is correct or not, and unless we add additional manual back-of-the-envelope reasoning and code we now have an incorrect program. The true type for getting a value from a List is something like List[A] => A => A.

I suggest you watch Ken Scambler's excellent video for a more detailed description of why partiality increases complexity https://www.youtube.com/watch?v=yS4Dbb1KfmA. Or Philip Wadler's Propositions as Types https://www.youtube.com/watch?v=IOiZatlZtGU on why totality matters for types.

Matthew de Detrich · Answer 54 · Fri Nov 06 2015 12:11:44 GMT+0800 (China Standard Time)

The lack of correctness or totality makes things more complicated (or less simple). This is easy to demonstrate.

Your stating this as an axiom when it isn't one

Take List[A].head for example it has the type List[A] => A. Given that types are (equivalent to) propositions, we now have a proposition that is false; given a List we cannot always produce a value. We can no longer rely on the type to describe to us whether our program is correct or not, and unless we add additional manual back-of-the-envelope reasoning and code we now have an incorrect program. The true type for getting a value from a List is something like List[A] => A => A.

Using your logic, List should return a disjunction of type Either, because its theoretically possible for it to fail due to memory constraints. So according to you, making all List operations have something of type Either[T,Failure] would be more simple. I would hardly agree this makes using a List more simple, but it does make it more correct.

You are arguing about correctness, not simplicity. There isn't a one to one correlation between them. You can make things so simple that they are very hard to get correct (assembly), and you can emphasis correctness so much that things are no longer simple (try writing a generic program in Coq)

In any case, this is getting to be off topic. The scope for the SLIP has already been defined, and you are redefining simple as correct

David Barri · Answer 55 · Fri Nov 06 2015 13:01:39 GMT+0800 (China Standard Time)

Using your logic, List should return a disjunction of type Either, because its theoretically possible for it to fail due to memory constraints.

I don't think that's the kind of strawman meant in "Wanted: Strawman proposals".

Matthew de Detrich · Answer 56 · Fri Nov 06 2015 13:03:19 GMT+0800 (China Standard Time)

I don't think that's the kind of strawman meant in "Wanted: Strawman proposals".

Of course not, but than you can argue the same thing about @jedws proposal. The point being made is that correctness is not the same thing as simple, and they are not directly comparable, which is addressing point 3, i.e.

We should strive for simplicity in APIs and implementations.

David Barri · Answer 57 · Fri Nov 06 2015 13:12:42 GMT+0800 (China Standard Time)

Of course not, but than you can argue the same thing about @jedws proposal.

No mate. I honestly can't find a way to interpret @jedws's points as strawman. It's a valid point and I disagree with this false dichotomy you're trying to draw. It's not true that correctness will always prevent simplicity. It doesn't have to be the case here either.

Matthew de Detrich · Answer 58 · Fri Nov 06 2015 13:15:17 GMT+0800 (China Standard Time)

It's a valid point and I disagree with this false dichotomy you're trying to draw. It's not true that correctness will always prevent simplicity. It doesn't have to be the case here either.

This is a strawman, I was never making a false dichotomy. My point is that an excessive use of correctness can actually harm simplicity, not help improve it.

It doesn't have to be the case here either.

Don't disagree with this, but I was never making this point...

Jed Wesley-Smith · Answer 59 · Fri Nov 06 2015 13:44:11 GMT+0800 (China Standard Time)

Using your logic, List should return a disjunction of type Either, because its theoretically possible for it to fail due to memory constraints. So according to you, making all List operations have something of type Either[T,Failure] would be more simple. I would hardly agree this makes using a List more simple, but it does make it more correct.

No, within any logic there are some things that are taken as axiomatic. Within computing, appropriate resource availability and correctness of underlying hardware and environment is usually one of them (unless you're building OSs).

Your argument mistakenly conflates non-deterministic runtime problems, with things that are deterministically provable and essentially says, "well, you can't prove all the things, so bugger it, don't bother proving anything". I am not averse to this argument, it is essentially the argument against type systems, but don't think it has much place within the design of strongly typed APIs.

The central point remains, introducing non-determinism, or falsity, or incorrectness, does actually increase complexity.

Seth Tisue · Answer 60 · Fri Nov 06 2015 21:20:16 GMT+0800 (China Standard Time)

Please take this discussion to another forum such as scala-debate.

balefrost · Answer 61 · Tue Nov 17 2015 00:22:37 GMT+0800 (China Standard Time)

As somebody who writes C# at work and Scala at home, there are some conveniences in the .NET collection API that I miss in Scala. I realize that most of the discussion here is related to the implementation details of the collections library, whereas my comments are more concerned with the user API of the collections library. Still, maybe these thoughts will be useful.

Suppose I have a list of pairs (to be interpreted as key/value pairs). I'd like to group the values by their corresponding keys, with the understanding that there may be duplicate keys. In Scala, I would do that like this (unless there's a simpler way that I'm not seeing):

pairs.groupBy(_._1).mapValues(_.map(_._2))

Or, to be more verbose but perhaps also more efficient, I might do this:

pairs.foldLeft(Map.empty[K, Set[V]]) {
    case (acc, (k, v)) =>
        acc + (k -> acc.getOrElse(k, Nil) :+ v)
}

Whereas in C#, I could use an overload of GroupBy do this:

pairs.GroupBy(x => x.Item1, x => x.Item2)

The .Net library realizes that simultaneous mapping while grouping is a common use case, so they provide a way to do that. The C# version feels far more ergonomic to me than either Scala snippet.

Another nice feature of the .Net collection API is that all methods that iterate the collection with a callback have overloads whose callback function takes the current index. So while I would need to do this in Scala:

coll.zipWithIndex.map {
    case (el, idx) =>
        el * idx
}

I could do this in C#

coll.Select((el, i) => el * idx)

It's easy enough for Select, SelectMany, and friends to also track the index while iterating, so they provide overloads exposing that functionality.

The .Net collection API includes many such conveniences. It's clear that the designers put a lot of thought into it. And though having methods take multiple function parameters seems like a questionable idea, in practice, it works out surprisingly well.

I realize that I'm arguing for convenience, which goes somewhat against goal #3, above. I also realize that the sort of convenience methods that I'm talking about could be implemented by a third-party library. But I believe that the Scala collection library would be better for having these conveniences built-in.

Paolo G. Giarrusso · Answer 62 · Tue Nov 17 2015 00:31:01 GMT+0800 (China Standard Time)

whereas my comments are more concerned with the user API of the collections library. Still, maybe these thoughts will be useful.

It sounds interesting (and I've run into the first problem myself), but that's also more in scope for the scala-debate discussion. (Also, I think in Scala you'd want to have different method names — IIRC if you overloaded groupBy, the type of x in groupBy(x => ...) would not be inferred).

Qihui Sun · Answer 63 · Thu Nov 19 2015 20:13:21 GMT+0800 (China Standard Time)

As a reference or Joke: Collection Development Philosophy (web.library.yale.edu/policy/collection-development-statements )

Qihui Sun · Answer 64 · Mon May 22 2017 18:11:39 GMT+0800 (China Standard Time)

I really expect LazyList or Stream add these powerful and convenient methods :
repeat/cycle, repeatedly/generate, iterate, unfold, resource, reject,
like in Elixir and Clojure: https://hexdocs.pm/elixir/Stream.html#unfold/2
This symbol #:: is too low,strange to verbose,unreadable.
Also, I hope add some xxxIndexed like in Kotlin Sequence: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequences/index.html
I think, one of the programming trend is the Lazy Sequences .

Peter Fraenkel · Answer 65 · Tue Jun 27 2017 02:29:23 GMT+0800 (China Standard Time)

I can't figure out where this discussion ended up on transducers (mentioned first above by @viktorklang ), but it seems like it wouldn't be hard to compose Views with some sort of ViewTransformer[-A,+B]. For example, like this.
This separates the composition of combinators from application to a specific collection, which is accomplishes one of the goals of transducers. We don't get the (supposed) benefit of JIT optimization of literal function composition of transformations of reducing functions.

POC in scala/collection-strawman#144

Guillaume Martres · Answer 66 · Thu Jan 11 2018 07:38:53 GMT+0800 (China Standard Time)

Closing since this discussion has gone stale. Discussions related to the main collection strawman proposal should go in https://github.com/scala/collection-strawman. More general discussions should probably move to https://contributors.scala-lang.org .