Create Double-Move Instruction

Question

Create Double-Move Instruction

laforest opened this issue 10 years ago · comments

Charles Eric LaForest commented 10 years ago

To better move data to/from cores, and to feed accelerators, create an instruction which does no computation, but moves the A/B data to two D destinations.

Split the D operand into two 6-bit fields, maybe offset by a programmable amount, so we can do double-writes in that window, one to A, one to B.

Add shuffle version, which swaps A/B at write.

Maybe short-circuit data so we don't go through ALU?
Consequences of non-8-cycle-latency operation?

siupakm · Answer 1 · Sun May 18 2014 05:09:42 GMT+0800 (China Standard Time)

It is nice that you could move A and B data together, but is it really useful to move to 2 different destinations at the same time?

If you mainly use the double-move instruction to send A and B data to accelerators, then only one D destination is sufficient. For example, you want to use an accelerator to write to main memory. A will store the destination address and B will store the data, but it is the same accelerator doing the "write to main memory" action. So if the accelerator at D can accept A and B at the same time, there's little reason to split the inputs of the same accelerator into two D destinations.

Charles Eric LaForest · Answer 2 · Sat May 31 2014 03:29:57 GMT+0800 (China Standard Time)

Two destinations are still important, as one expected common case is moving data through one Octavo core to another. With a double-move, a single instruction can read from 2 I/O ports and write to 2 different I/O ports. For compiler-scheduled data movements (array reversal, matrix transposition, moving data in a mesh of cores), that would be a big win.

There is little point in moving a double-word combined from A and B into a single location anywhere in D address space: regular memory locations can't accept a double word, and the only way to write such a double word to an accelerator is to use 2 I/O ports, which must each be in different (A or B) memories since each memory can only do one write and one read, each word-wide, at a time.

That's the key reason for a double-move: together the A and B memories can do 2 read and 2 writes at a time, but normal operations only use 2 reads and 1 write. We're wasting half the potential write bandwidth when needing to only move data.