microsoft / pai

Resource scheduling and cluster management for AI

Home Page:https://openpai.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to increase memory for worker?

zsh4614 opened this issue · comments

Organization Name:HIT

Short summary about the issue/question:How to expand the memory of a worker

Brief what process you are following:
When I first deployed, the memory of each worker was 128G, but later found that the memory was not enough, so I increased the physical memory of each worker to 512G, after modifying layout.yaml, I restarted all cluster services, But on the webportal I found that the cluster resources have not changed and the SKU size has not changed, what should I do, please help me, thank you!

expand the memory of a worker

How to reproduce it:

OpenPAI Environment:

  • OpenPAI version: v1.8.0
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:

Anything else we need to know:
image

I am a user of OpenPAI, I think you can try to change hivedscheduler config, sku type is controled by it, refer to https://github.com/microsoft/hivedscheduler/blob/master/doc/user-manual.md

I am a user of OpenPAI, I think you can try to change hivedscheduler config, sku type is controled by it, refer to https://github.com/microsoft/hivedscheduler/blob/master/doc/user-manual.md

Thank you very much! I changed the hivedscheduler configuration in services-configuration.yaml and it took effect. Thanks!

glad to here that