hbs-rcs / hbsgrid-docs

Documentation and discussion forum for the HBS grid technology preview

Home Page:https://hbs-rcs.github.io/hbsgrid-docs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve Available Resources utility to show unreserved memory

izahn opened this issue · comments

Discussed in https://github.com/hbs-rcs/hbsgrid-docs/discussions/14

Originally posted by Econometrica17212 January 25, 2022
Hey all,

I'm having trouble getting an interactive job request launched and thought I'd share the issues so that others might learn from my mistakes.

I'm currently trying to launch a short_int job with 500G of RAM and 4 CPUs of RStudio. I'm working with a dataset that is 390GB, so I need the maximum amount of RAM for this. Luckily the operations I'm running shouldn't take up too much more RAM as I'm just doing some basic analysis and regressions, not data manipulation.
When I run this via the tech preview environment using the submission GUI, I get the following error:
Screenshot from 2022-01-25 10-54-59error

Looking at the user guide, there is a video documenting a similar issue where the user requests 10 CPUs and 4 GB RAM. However, the error code is different. The video shows that the usage report demonstrates that no node has the 10 available CPUs, so the job is resubmitted with 2 CPUs and it works.

When I check the HBS Cluster usage monitor, it seems that there is ample room for the job to be run, e.g. on node 13. As a result, I'm a bit confused why I'm getting the error message here. It does also say when running bqueues that there are 28 pending jobs -- could this be why? Am I just behind in a long line of jobs right now? For other context, I am not running any other jobs, pending or running, on any queue, when attempting this.
Screenshot from 2022-01-25 10-55-38usage

Hopefully there is a simple explanation here that will help others understand too!

(also worth noting that the button in the first image does not say "Read Documentation," so that could be changed to make it clearer which button to press)

We now have a new-and-improved available resources utility! Let me know if you see any other issues with it.