enable rpc for server
steampunque opened this issue · comments
I made a quick patch to server to test RPC running phi-3 fully offloaded onto a remote GPU with the server and all seemed OK, timings:
pp: 258.19 tokens per second
tg: 48.41 tokens per second
Run locally on the same GPU as the remote machine gives:
pp: 563.30 tokens per second
tg: 92.00 tokens per second
Possible Implementation
If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.
Patches are trivial:
printf(" --port PORT port to listen (default (default: %d)\n", sparams.port);
+ printf(" --rpc SERVERS comma separated list of RPC servers\n");
} else if (arg == "--host") {
if (++i >= argc) {
invalid_param = true;
break;
}
sparams.hostname = argv[i];
+ } else if (arg == "--rpc") {
+ if (++i >= argc) {
+ invalid_param = true;
+ break;
+ }
+ params.rpc_servers = argv[i];