aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.

Home Page:https://github.com/aws/aws-parallelcluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: Slurm Accounting migration between ParallelCluster versions

christianversloot opened this issue · comments

We are in the process of setting up a cluster with AWS ParallelCluster and AWS ParallelCluster UI. We are also working on writing a plan for upgrading the cluster. Given our knowledge and what we've learned online, doing so (in the case of new ParallelCluster versions) would require us to:

  1. Set up a new ParallelCluster UI stack with the target version.
  2. Through that UI, create a new ParallelCluster with the target version (using the same accounting database RDS cluster).
  3. Ensure that the cluster is operational, then delete the stacks of the old cluster and the old UI.

The cluster has Slurm Accounting setup. We use a separately deployed Aurora based RDS cluster meaning that it is not deleted in between UI upgrades. However, we've observed that when setting up a new cluster, the newly created cluster's accounting database is tightly coupled to the cluster itself by means of (1) database name and (2) table names.

The problem this gives for our cluster users is that when creating a new ParallelCluster under step 2 above, all accounting data is lost - invisible, if you will, because it's in a different database within the cluster.

We have looked into migrating with DMS, but because of the tight coupling between cluster and database (via table names), this proves to be quite difficult and potentially error prone. Unfortunately, dumping the database then inserting it into the new database instance will also not work for us, either because of the tight coupling OR because the new cluster cannot have a name equal to that of the old cluster (and we cannot have downtime while upgrading).

Looking around in both the AWS docs and on the internet, I've not found much that points me in the right direction. However, many customers must be running into this when upgrading to new ParallelCluster versions. I'd thus welcome a suggestion as to how to handle this. Is there a way to have the accounting database running loosely coupled from a ParallelCluster, allowing multiple clusters to be supported within one database (as suggested by the cluster_table table)? Any other approach that works for many customers? We're so far using the service quite happily, but this seems to be a bit of a roadblock.

Thanks in advance!

Hi @christianversloot , thanks for reaching out and let us know about your use case.

In general, you can configure the cluster to use a database with whatever arbitrary name, using the configuration Scheduling/SlurmSettings/Database/DatabaseName, introduced in ParallelCluster v3.8.0.

However, as of ParallelCluster v3.9.1 your upgrade use case is not supported because:

  1. you cannot have more than one cluster insisting on the same database (it may lead to inconsistencies in the DB). This limitation will be addressed in future releases, but we do not have a date yet.
  2. even when you will have more clusters insisting on the same DB, you will be able to share it only among the ParallelCluster versions shipping cross compatible version of the SLURM daemons responsible for the DB.

Some follow up questions to know more about your use case:

  1. It's clear that you're looking for a general solution. Anyway, what is the specific source and target version of ParallelCluster you're willing to upgrade?
  2. is the no-downtime requirement during the upgrade a strict requirement or a nice to have?

Thank you.

Hi @gmarciani thanks for your response!

To answer your questions:

  1. Currently we're building up the cluster, so there is no strict migration case yet (the jobs so far have been testing jobs, for which keeping the accounting data is not necessary). Our idea is that whenever our ECFlow based workflow scheduler is entirely set up, we create a cluster with the newest available ParallelCluster (and UI) - which is now 3.9.1 I believe. From that moment, we'd be interested in a general solution as accounting data is then related to production jobs.
  2. I would need to discuss this with the team, but in case of downtime our entire production flow stops and needs to be re-run afterwards. Last week, the team's estimate was that a 1-hour period for spinning up a new cluster would be survivable, so I can imagine that a similar amount of downtime is acceptable when upgrading. Let me check this with the team on Wednesday and then get back to you.

Additionally, even though I know that this is not the responsibility of this repository - the availability of DatabaseName needs to be reflected in parallelcluster-ui as well. Currently, if you create a cluster through the UI, then you cannot provide the database name. I saw that it was added not too long ago, so I see why it is not yet present in the UI, but what is the approach here? Create a ticket in the UI repo too? Something you can put in motion internally? Thanks!

Thanks for all the valuable information about your use case. Waiting for the missing info about max acceptable downtime.

Regarding ParallelCluster UI, I suggest to create an issue in https://github.com/aws/aws-parallelcluster-ui/issues

Thanks, created the request: aws/aws-parallelcluster-ui#329

Thank you! In the issue aws/aws-parallelcluster-ui#329 it seems that you're planning to use the DatabaseName property as soon as it will be avail in PCUI to manage the upgrade.

Just to verify we are on the same page: this should not be done until we will provide the support for an external SlurmDBD in ParallelCluster, which is planned for future releases.

Yes, understood.

Hi @gmarciani - we had a discussion within the team and came to these downtime allowances:

  1. Generally speaking a downtime of 3 hours maximum is acceptable.
  2. It is not preferred but this can be stretched to 6 hours if really necessary, and informing clients may be necessary.
  3. Only in exceptional cases a downtime of up to 1 day is acceptable, but this will have a severe impact on our deliveries and requires informing clients and is generally perceived negatively.

Fortunately, as we've thoroughly documented upgrading a cluster between ParallelCluster (and UI) versions, I expect we should be able to stay < 1 to 1.5 hours most of the times.

In other words, stopping the compute fleet in the old cluster then spinning up a new cluster is OK for us. It would be best if both clusters could be hosted within the same database cluster, either via a different way of setting things up (by separating database creation from cluster creation) or allowing two clusters with the same name to co-exist (we don't want to delete the old cluster first before setting up the new one).