May be a icache miss thread still can be scheduled again?

Question

May be a icache miss thread still can be scheduled again?

zhangliang1997 opened this issue 4 years ago · comments

Hi jbush001:
I think i find a bug, but i'm not sure. In ifetch_tag_stage.v, there are some codes about thread scheduled.

    assign can_fetch_thread_bitmap = ts_fetch_en & ~icache_wait_threads;
    assign thread_sleep_mask_oh = cache_miss_thread_oh & {`THREADS_PER_CORE{ifd_cache_miss}};
    assign icache_wait_threads_nxt = (icache_wait_threads | thread_sleep_mask_oh)
        & ~l2i_icache_wake_bitmap;

I think a icache miss thread still can be scheduled, for example:

In cycle n, if we select 0 thread to fetch instruction;
In cycle n+1, 0 thread icache miss in ifetch_data_stage, in the same time(cycle n+1), we will select a thread to fetch instruction. Because 0 thread icache miss, so in n+1 cycle, 0 thread can't have opportunity to be scheduled or arbitrated.

So i think can_fetch_thread_bitmap should be:

    assign can_fetch_thread_bitmap = ts_fetch_en & ~icache_wait_threads & ~thread_sleep_mask_oh;

So is there any problem about these code ? or just i don't have a thorough understand about code?

Jeff Bush · Answer 1 · Thu Dec 03 2020 11:04:03 GMT+0800 (China Standard Time)

Good catch! Your observation is correct that it may schedule the thread unnecessarily if there is a miss, and that adding ~thread_sleep_mask_oh would prevent that. It actually used to be implemented the way you suggested. The reason I changed it is that it ended up being a long combinational path that significantly reduced the maximum clock speed. My workaround was to invalidate the instruction (which injects a bubble, but is otherwise correct). This wastes a clock cycle, but should be relatively infrequent, and my premise is that the improvement in clock speed would outweigh that.

Here is the change where I removed that logic:

33484aa

Later I removed even more:

7064fe4

I was not able to think of a way to reduce the combinational delay in that logic and avoid this.

张亮 · Answer 2 · Thu Dec 03 2020 12:15:06 GMT+0800 (China Standard Time)

Thanks very much ! I have understanded it.
By the way, i learned a lot from nyuzi, such as the verilog implement of the gpgpu. Before i find nyuzi, i just learned gpgpu architecture from gpgpusim. After l read nyuzi code, l learn a lot "real" things. This is a very cool project!
Thanks again!

Jeff Bush · Answer 3 · Thu Dec 03 2020 12:53:21 GMT+0800 (China Standard Time)

Thanks, I'm glad you found it useful! :)