laforest / Octavo

Verilog FPGA Parts Library. Old Octavo soft-CPU project.

Home Page:http://fpgacpu.ca/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create Double-Move Instruction

laforest opened this issue · comments

To better move data to/from cores, and to feed accelerators, create an instruction which does no computation, but moves the A/B data to two D destinations.

Split the D operand into two 6-bit fields, maybe offset by a programmable amount, so we can do double-writes in that window, one to A, one to B.

Add shuffle version, which swaps A/B at write.

Maybe short-circuit data so we don't go through ALU?
Consequences of non-8-cycle-latency operation?

It is nice that you could move A and B data together, but is it really useful to move to 2 different destinations at the same time?

If you mainly use the double-move instruction to send A and B data to accelerators, then only one D destination is sufficient. For example, you want to use an accelerator to write to main memory. A will store the destination address and B will store the data, but it is the same accelerator doing the "write to main memory" action. So if the accelerator at D can accept A and B at the same time, there's little reason to split the inputs of the same accelerator into two D destinations.

Two destinations are still important, as one expected common case is moving data through one Octavo core to another. With a double-move, a single instruction can read from 2 I/O ports and write to 2 different I/O ports. For compiler-scheduled data movements (array reversal, matrix transposition, moving data in a mesh of cores), that would be a big win.

There is little point in moving a double-word combined from A and B into a single location anywhere in D address space: regular memory locations can't accept a double word, and the only way to write such a double word to an accelerator is to use 2 I/O ports, which must each be in different (A or B) memories since each memory can only do one write and one read, each word-wide, at a time.

That's the key reason for a double-move: together the A and B memories can do 2 read and 2 writes at a time, but normal operations only use 2 reads and 1 write. We're wasting half the potential write bandwidth when needing to only move data.