Range-based indexing

Question

Range-based indexing

Andlon opened this issue 8 years ago · comments

Andreas Borgen Longva commented 8 years ago

Currently, to take a slice of a matrix, one needs to use e.g. the sub_slice function. I propose to add implementations of Index<(Range<usize>, Range<usize>)>. See the following example:

let x = matrix![ 1.0, 2.0, 3.0;
                 4.0, 5.0, 6.0;
                 7.0, 8.0, 9.0];

// Want to take the 2x2 lower-right corner
// Current way:
let corner = x.sub_slice([1, 1], 2, 2);

// Range-based indexing:
let corner = x[(1 .. 3, 1 .. 3)];

// Or, more succinctly:
let corner = x[(1 .., 1 ..)];

// More flexible indexing, let's take the two first columns
let two_last_cols = x[( .. 2, ..)]

Similarly, we can slice rows or columns (returning Row or Column) using a mix of usize and Range<usize>:

let second_row = x[(1, ..)];
let second_col = x[(.., 1)];

// Only parts of a given row, still returns a `Row` struct 
// (so we can extract a raw slice, which is useful for performance)
let parts_of_second_row = x[(1, 1 .. 3 )];

The range-based indexing can be implemented in terms of Index<(R1, R2)> where R1, R2 are any of the following concrete types from the stdlib: Range, RangeFull, RangeTo, RangeFrom.

Any thoughts on this? I think it would really empower matrix slices.

James Lucas · Answer 1 · Sun Feb 12 2017 04:14:00 GMT+0800 (China Standard Time)

This is something that has been somewhat overlooked so far. First I'll explain why it doesn't exist right now.

Currently our index trait looks like this: Index<[usize; 2]>. I opted for this because I much preferred the mat[[i,j]] syntax to mat[(i, j)] (though both are worse than mat[i,j]). Sadly we cannot keep the same syntax for Range based indexing, because [T] must have a consistent type. This corresponds to R1 and R2 being the same in your example above. We could get around this by using the tuple syntax as you suggest but the inconsistency is undesirable.

To resolve this I see a few solutions:

Ignore the inconsistency and have Range indexing use tuples. (A bad idea I think).
Replace current indexing with tuples. (Open to discussion but I'm not keen on the look and it's a pretty nasty breaking change).
Use Index<[R; 2]> and only allow the same type on each axis.
Use inspiration from ndarray and provide a trait for indexing types. We'd implement this trait for [usize; 2], and hopefully this presents some nice ways around some of the issues described above (though not all).

Andreas Borgen Longva · Answer 2 · Sun Feb 12 2017 04:56:13 GMT+0800 (China Standard Time)

Initially, I also found x[[i, j]] to look better than x[(i, j)]. However, after having written a lot of indexing lately, I've realized that the latter is vastly more readable, because the square brackets tend to be easily confused (when glancing over the code) with common indexing letters that are straight, such as i or l.

I wanted to write up a thread on internal.rust-lang.org to see if there's any interest in providing syntax sugar for tuple indexing, i.e. x[i, j] would be the same as x[(i, j)], and if there are any reason this would not be a good idea.

In any case, that's not going to happen any time soon, so let's consider our options.

Agree, this is not good.
I'm open to this. We could also provide it alongside the existing indexing and try to gradually phase out array-based indexing.
I think this is a very unfortunate and artificial limitation.
This is an interesting idea, but it seems it doesn't really impact this particular problem though. That is, you still face the problem of which types to implement this trait for. It seems ndarray basically implements it for all of them, allowing both tuple-based and array-based indexing. This is also acceptable to me, I'm mostly concerned with being able to do flexible matrix slices much more simply.

James Lucas · Answer 3 · Sun Feb 12 2017 05:06:05 GMT+0800 (China Standard Time)

I should have expanded a little on why I suggested the 4th option. I was indeed thinking that we could implement the trait for [usize; 2], and then tuple indexing such as: (usize, usize), (Range, RangeTo), etc. We could even impl OurIndex for usize and have this return a row - though this feels a bit like taking some overly strong moral stance...

Anyway, what I was trying to say was that we could do this to avoid making a strong decision right now. We get fancy range indexing with tuple syntax but don't break everyone elses indexing code (yet).

I would be interested in learning more about any official movement on syntactic sugar for indexing. If you do get around to making that post please let me know :).

Andreas Borgen Longva · Answer 4 · Sun Feb 12 2017 05:14:44 GMT+0800 (China Standard Time)

@AtheMathmo: that seems reasonable, and would probably make the code a whole lot cleaner as opposed to direct implementations of Index<(R1, R2)> for each combination of R1 and R2, as well as [R; 2]. I like it. Personally, I think single-number indexing of matrices should be avoided. In that case, one might simply just write x[(i, ..)] which is almost as short, and completely consistent with matrix indexing.

A bit of a side-note: One thing I've been thinking about lately, is the fact that in some places we use e.g. offsets or pointer arithmetic. These are inherently isize-based. In practice, this means that our indexing might be broken for values that lie in the interval (max(isize), max(usize)). For a 64-bit system, you would never get matrices or vectors this big, but I think it's at least conceivably possible to build such a large vector on a 32-bit system... I'm just wondering if this could cause us any problems. Most likely not, I expect!

Andreas Borgen Longva · Answer 5 · Sun Feb 12 2017 05:57:14 GMT+0800 (China Standard Time)

It was just pointed out to me on IRC that we actually cannot return a MatrixSlice, because Index returns a reference to its Output. There doesn't seem to be any way around that currently (afaik).

Hence my point here is mostly moot!

James Lucas · Answer 6 · Sun Feb 12 2017 06:03:29 GMT+0800 (China Standard Time)

Ah I totally forgot about this part of the problem! Before we can do this (and many other cool things) we need custom DSTs. Sadly this PR was closed recently and although I'd love to keep pushing it forward myself I lack the technical know-how.

Andreas Borgen Longva · Answer 7 · Sun Feb 12 2017 06:16:07 GMT+0800 (China Standard Time)

Thanks, that was an interesting read!

Andreas Borgen Longva · Answer 8 · Sun Feb 12 2017 06:25:26 GMT+0800 (China Standard Time)

Oh, and by the way, I decided to post on internals about tuple syntax sugaring: https://internals.rust-lang.org/t/opinions-on-syntax-sugar-for-tuple-based-indexing/4776

Will be interesting to see if anyone has anything to say on the topic!

Andreas Borgen Longva · Answer 9 · Sun Feb 12 2017 17:48:09 GMT+0800 (China Standard Time)

Since we're not going to be able to do this for a potentially very long time, I will close this issue for now. I hope we can revisit it in the future.