aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.

Home Page:https://github.com/aws/aws-parallelcluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enable QUIC transport protocol for DCV on head node

elduds opened this issue · comments

DCV on the head node leverages the default transport protocol configuration of WebSockets over TCP, but it can support QUIC with minor backwards-compatible configuration change.

I enable QUIC by default as my users are up to 300msec from our preferred Parallel Cluster region and experience significant latency, jitter & packet loss, which are far more tolerable using QUIC.

Regardless, given the performance improvements, I think the project should consider enabling QUIC by default, but at the very least make it optional, given that the worst-case outcome is any network or client issues when attempting QUIC simply result in graceful fallback to the existing TCP-based connections.

One complication is that QUIC is only supported using the DCV thick client, not via web browser.

Proposed changes required:

  1. Set enable-quic-frontend=true in /etc/dcv/dcv.conf
  2. Add UDP/ from in the head node security group
  3. Update pcluster dcv-connect to test for & launch the thick client after fetching session credentials

FWIW I'm using

  • Parallel Cluster v3.8.0
  • AWS ParallelCluster AMI for alinux2, kernel-5.10.201-191.748.amzn2.x86_64, lustre-2.12.8-2.amzn2.x86_64, efa-2.6.0-1.amzn2.x86_64, dcv-2023.0.15487-1.el7.x86_64, nvidia-535.129.03, cuda-12.2.20230823