laforest / Octavo

Verilog FPGA Parts Library. Old Octavo soft-CPU project.

Home Page:http://fpgacpu.ca/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use folded CALLs to implement pre-fetching from out-of-core memory.

laforest opened this issue · comments

Use IO_Ready performance counters to drive adjustments of BTM entry to move a folded CALL to a prefetch routine back or forward to reduce stall time on I/O to external memory.

We could also store this new Branch Origin value into the data memory or binary image to make the tuning persistent, or measure the latency once and feed it to the compiler.

Log stall times and use an interpreter routine to do adjustments to save on-chip space and since the decision may be complex and isn't performance critical.

Multiple different memory latencies will require multiple measurements, maybe tried to address ranges?

Reference: "Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications", by Todd C. Mowry, Angela K. Demke, Orran Krieger, OSDI 1996, Best Paper