sarchlab / mgpusim

A highly-flexible GPU simulator for AMD GPUs.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MultiNode Simulation

amelfatima1231 opened this issue · comments

Is it possible to simulate multiple interconnected nodes? What changes (a high level description) will be required if we want to simulate multiple interconnected nodes where each node consists of a CPU and multiple GPUs?

Yes. You can configure the multiple nodes by adding network links that are slower than intra-node links.

The only problem is that we do not support multiple CPUs in a system. I guess you do not care too much about CPUs? You can always create a magic link that connect the CPU to the GPU switches. See my hand drawing below.

IMG_0500A8E1A0E4-1

You can configure the network the anyway you want. Currently, the default configuration uses the PCIe connector to establish the network. https://github.com/sarchlab/akita/blob/v3/noc/networking/pcie/pcie.go#L14. To define your own network, my recommendation is to create a new network connector (or modify the PCIe connector).

Just to confirm, will we have a single root complex in this case or two?

Just to confirm, will we have a single root complex in this case or two?

Single root complex. The address spaces are unified. Works as if they are in a single machine, but just a link being slower.

Thanks.
Also, can't we directly connect the two SWs instead of creating an additional SW (labelled as NIC) between them? I can create another function which connects the two SWs together with different bandwidth and latency parameters?

Thanks. Also, can't we directly connect the two SWs instead of creating an additional SW (labelled as NIC) between them? I can create another function which connects the two SWs together with different bandwidth and latency parameters?

Yes. That should work. Just make sure not to create a path that the traffic can bypass the slow link. The current routing algorithm is maximum-bandwidth-based routing.

Got it.

Thank you so much.

Since the root complex also works as a switch, I see that the path from the switch across the root complex and into the other switch is followed instead of the direct connection between them. :/

Since the root complex also works as a switch, I see that the path from the switch across the root complex and into the other switch is followed instead of the direct connection between them. :/

Is the CPU-GPU communication important for you case. If not, simply add the CPU to a single machine should solves the problem. Otherwise, you have to modify the routing algorithm.

Actually, we have both bandwidth-first routing (https://github.com/sarchlab/akita/blob/v3/noc/networking/networkconnector/bandwidth_first_routing.go) and least-hop routing (https://github.com/sarchlab/akita/blob/v3/noc/networking/networkconnector/floydwarshall.go#L15). So changing the routing algorithm (https://www.youtube.com/watch?v=4rgSzQwe5DQ&t=731s) will work.

Also, I just realized that we default to least-hop routing.

Anyway, if you understand how routing works, you can define your own routing algorithm.

No, the CPU-GPU communication is not important in this case.

Got it. I will try this now.

If CPU-GPU communication is not important to you, do not forget to use magic-memory-copy option. It will save some simulation time.

I am closing the issue. Please feel free to reopen if any further discussion is needed.