MultiNode Simulation

Question

MultiNode Simulation

amelfatima1231 opened this issue a year ago · comments

Is it possible to simulate multiple interconnected nodes? What changes (a high level description) will be required if we want to simulate multiple interconnected nodes where each node consists of a CPU and multiple GPUs?

Yifan Sun · Answer 1 · Mon Jul 31 2023 21:34:59 GMT+0800 (China Standard Time)

Yes. You can configure the multiple nodes by adding network links that are slower than intra-node links.

The only problem is that we do not support multiple CPUs in a system. I guess you do not care too much about CPUs? You can always create a magic link that connect the CPU to the GPU switches. See my hand drawing below.

You can configure the network the anyway you want. Currently, the default configuration uses the PCIe connector to establish the network. https://github.com/sarchlab/akita/blob/v3/noc/networking/pcie/pcie.go#L14. To define your own network, my recommendation is to create a new network connector (or modify the PCIe connector).

amelfatima1231 · Answer 2 · Mon Jul 31 2023 21:46:33 GMT+0800 (China Standard Time)

Just to confirm, will we have a single root complex in this case or two?

Yifan Sun · Answer 3 · Mon Jul 31 2023 21:48:33 GMT+0800 (China Standard Time)

Just to confirm, will we have a single root complex in this case or two?

Single root complex. The address spaces are unified. Works as if they are in a single machine, but just a link being slower.

amelfatima1231 · Answer 4 · Mon Jul 31 2023 22:37:02 GMT+0800 (China Standard Time)

Thanks.
Also, can't we directly connect the two SWs instead of creating an additional SW (labelled as NIC) between them? I can create another function which connects the two SWs together with different bandwidth and latency parameters?

Yifan Sun · Answer 5 · Mon Jul 31 2023 22:40:19 GMT+0800 (China Standard Time)

Thanks. Also, can't we directly connect the two SWs instead of creating an additional SW (labelled as NIC) between them? I can create another function which connects the two SWs together with different bandwidth and latency parameters?

Yes. That should work. Just make sure not to create a path that the traffic can bypass the slow link. The current routing algorithm is maximum-bandwidth-based routing.

amelfatima1231 · Answer 6 · Mon Jul 31 2023 22:41:19 GMT+0800 (China Standard Time)

Got it.

Thank you so much.

amelfatima1231 · Answer 7 · Mon Jul 31 2023 22:55:00 GMT+0800 (China Standard Time)

Since the root complex also works as a switch, I see that the path from the switch across the root complex and into the other switch is followed instead of the direct connection between them. :/

Yifan Sun · Answer 8 · Mon Jul 31 2023 22:56:49 GMT+0800 (China Standard Time)

Since the root complex also works as a switch, I see that the path from the switch across the root complex and into the other switch is followed instead of the direct connection between them. :/

Is the CPU-GPU communication important for you case. If not, simply add the CPU to a single machine should solves the problem. Otherwise, you have to modify the routing algorithm.

Yifan Sun · Answer 9 · Mon Jul 31 2023 23:00:59 GMT+0800 (China Standard Time)

Actually, we have both bandwidth-first routing (https://github.com/sarchlab/akita/blob/v3/noc/networking/networkconnector/bandwidth_first_routing.go) and least-hop routing (https://github.com/sarchlab/akita/blob/v3/noc/networking/networkconnector/floydwarshall.go#L15). So changing the routing algorithm (https://www.youtube.com/watch?v=4rgSzQwe5DQ&t=731s) will work.

Also, I just realized that we default to least-hop routing.

Anyway, if you understand how routing works, you can define your own routing algorithm.

amelfatima1231 · Answer 10 · Mon Jul 31 2023 23:04:32 GMT+0800 (China Standard Time)

No, the CPU-GPU communication is not important in this case.

amelfatima1231 · Answer 11 · Mon Jul 31 2023 23:05:08 GMT+0800 (China Standard Time)

Got it. I will try this now.

Yifan Sun · Answer 12 · Mon Jul 31 2023 23:06:02 GMT+0800 (China Standard Time)

If CPU-GPU communication is not important to you, do not forget to use magic-memory-copy option. It will save some simulation time.

Yifan Sun · Answer 13 · Tue Aug 01 2023 22:30:58 GMT+0800 (China Standard Time)

I am closing the issue. Please feel free to reopen if any further discussion is needed.