nodejs / build

Better build and test infra for Node.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Replacement for NearForm Mac OS machines

mhdawson opened this issue · comments

Talking with the Linux IT team (@ryanaslett, and @vvalderrv) the suggested plan is to add machines in MacStadium that the Linux IT team will configure/manage as a replacements for the NearForm machines.

With them being able to support/manage 24/7 the risk of being all in one provider is thought to be low enough to make this a good solution. They may look for a second provider but that would be after the deadline to stop using the machines at NearForm.

My thought is that we give access to the Linux IT team to spin up/manage a new set of machines in addition to the existing ones, kind like a second provider and potentially have them take over the existing ones as well at some point.

@UlisesGascon that would mean you'd continue to manage/keep the existing machines going. Does that makes sense to you? Also do we have enought resources to spin up a second set of machines.

If this makes sense then we'd need to give @ryanaslett access to our existing MacStadium account and support thim setting up some new machines.

@nodejs/build any concerns/comments/suggestions over this approach?

This would work only for Mac, what about the Linux machines?

@mcollina this issue is only intended do cover the Mac machines.

@ryanaslett, and @vvalderrv can you create another issue to share plans/current thoughts on handling the benchmark/linux machines?

With them being able to support/manage 24/7 the risk of being all in one provider is thought to be low enough to make this a good solution. [...] My thought is that we give access to the Linux IT team to spin up/manage a new set of machines in addition to the existing ones, kind like a second provider and potentially have them take over the existing ones as well at some point.

I agree. MacStadium has been a solid provider (AFAIK). The most problematic part has been Orka historically, but I guess that we can simplify many things with the lessons learned and with the opportunity that #3240 brings us in order to rethink the resources.

@UlisesGascon that would mean you'd continue to manage/keep the existing machines going. Does that makes sense to you?

I was thinking that the IT team will provide long-term support for this after the spin-up of the new machines, or are you discussing the existing ones? We can share the responsibility for a while in any case, in a handover mode, if we are thinking of this delegation.

If this makes sense then we'd need to give @ryanaslett access to our existing MacStadium account and support thim setting up some new machines.

I can prepare an onboarding session for the IT team and provide long-term support if needed, especially for Orka. An important thing to remember when asking for access is to include that any Mac infrastructure admin will have access to the Release machines located in Orka and MacStadium.

Also do we have enought resources to spin up a second set of machines.

If we are considering paying for or renegotiating the sponsorship, there are a few things we can take into consideration.

We currently use two products from MacStadium (Bare Metal and Orka Cluster) and have the requirement to provide support for Intel and ARM64:

Current setup

  • Bare Metal (ARM):
    • test-macstadium-macos11.0-arm64-4: Mac mini G5A (AS/M1/8C/8G/256G/SSD/1G)
    • test-macstadium-macos11.0-arm64-3: Mac mini G5A (AS/M1/8C/8G/256G/SSD/1G)
    • release-macstadium-macos11.0-arm64-1: Mac mini G5A (AS/M1/8C/8G/256G/SSD/1G)
  • Orka cluster (3 Nodes using Intel):
    • release-orka-macos11-x64-1
    • test-orka-macos1015-x64-2
    • test-orka-macos11-x64-1
    • test-orka-macos11-x64-2
    • test-orka-macos1015-x64-1

What requires migration?

By checking the inventory, the affected machines are:

  • test-nearform-macos10.15-x64-1
  • test-nearform-macos10.15-x64-2
  • test-nearform-macos10.15-x64-3
  • test-nearform-macos11.0-arm64-1
  • release-nearform-macos11.0-arm64-1

Possibilities

As far as I know, macOS 10.15 is only used by Node@18 (maintenance) for testing changes, but it can't be used for releases (as the minimum supported macOS version is 11; see: #3385 (comment)). So, we can decommission test-nearform-macos10.15-x64-1, test-nearform-macos10.15-x64-2, and test-nearform-macos10.15-x64-3 as we still have two Orka machines available: test-orka-macos1015-x64-1 and test-orka-macos1015-x64-2.

Based on the last distribution proposal (May'23) for MacOS 13 support, we still have 4 empty slots for more Intel machines that will remain empty.

SSH port Node: macpro-4 Node: macpro-5 Node: macpro-6
8822 release-macos11-x64-1 test-macos13-x64-2 test-macos11-x64-1
8823 test-macos13-x64-1 release-macos13-x64-1 test-macos11-x64-2
8824 empty test-macos1015-x64-2 test-macos1015-x64-1
8825 empty empty empty

So, if we add ARM Nodes to the existing Orka cluster, we can relocate the Bare metal (renegotiating the sponsorship), nearform ARM machines, and future MacOS13:

SSH port Node: macpro-4 Node: macpro-5 Node: macpro-6 Node: arm-1 Node: arm-2
8822 release-macos11-x64-1 test-macos13-x64-2 test-macos11-x64-1 test-nearform-macos11.0-arm64-1 release-macstadium-macos11.0-arm64-1
8823 test-macos13-x64-1 release-macos13-x64-1 test-macos11-x64-2 release-nearform-macos11.0-arm64-1 release-macos13-arm64-1
8824 empty test-macos1015-x64-2 test-macos1015-x64-1 test-macstadium-macos11.0-arm64-4 test-macos13-arm64-2
8825 empty empty empty test-macstadium-macos11.0-arm64-3 test-macos13-arm64-1

Note1: I assume that the new nodes have similar HW or requirements, allowing us to fit 4 VMs per Node. However, this will need confirmation from MacStadium support.

Note2: Assuming we can renegotiate the sponsorship, we may be able to obtain the new nodes in exchange for the current bare metal machines. This would entail exchanging 3 machines for 2 nodes, but we need to verify this with MacStadium.

Note3: I've retained the original VM/machine names from the inventory for clarity, but we'll rename the new VMs using test-orka... or release-orka....

Ideally, we should have a third ARM Node to accommodate a second release MacOS13 ARM machine, release-macos13-arm64-2. This would also provide us with another 3 empty slots, which we can use for backups or for spinning up MacOS14 in the future. It's important to note that Intel won't be supported in the future by Apple, so we can gradually remove the Intel nodes and renegotiate them as ARM nodes. Not sure if currently the sponsorship covers this transition.

Another option is to explore the possibility of adding virtualization to the bare metal machines to accommodate 2 VMs per machine. However, this may not provide enough space to add the ARM machines for MacOS13 at all.

Side notes

  • There is a pending question regarding what will happen with the physical hardware.
  • I will be on holiday for the last half of March, so my availability will be limited.

@UlisesGascon thanks for the complete answer, its a bit for me to digest so I'll probably reach out on slack to discuss.

@UlisesGascon and I talked today, sounds like the best way forward to repace the Nearform machines would be to:

  1. Ask if we can get 2 new ARM based orca nodes that can be used to host a number of machines
  2. Linux IT team spins up replacement machines for the ones at NearForm using those new nodes
  3. Expand from the number of arm machines that were are NearForm to better provide redundancy with the machines @UlisesGascon is managing.

What do I need to provide for macstadium access? How is their authentication handled? Do I need to create a macstadium account or profile and provide it, or is it email based?

What do I need to provide for macstadium access? How is their authentication handled? Do I need to create a macstadium account or profile and provide it, or is it email based?

You can create an account and we can add you to the group AFAIK 👍

Do we know if this macstadium account is considered a "Corporate" account?
image

In case It is not, I now have an account under raslett@contractor.linuxfoundation.org .

It would be great to have admin role access, but I think billing role would work as well: https://portal.macstadium.com/users/roles

Thank you

@mhdawson Hi Michael, I just wanted to check in to see where things are regarding the move from Nearform HQ to an alternative location or infrastructure, is the 1st of April still achievable to get setup? Thanks

@ryanaslett and @vvalderrv should probably chime in. I think we've agreed we can survive at least temporarily without the OSX machines so it comes down to the proposal for the replacement benchmark machines.

@efrisby I went ahead and updated the parent issue for this here: #3615 (comment)

@ryanaslett I'm not sure if the update in #3615 (comment) answers @efrisby's questions about April 1st?

I close this is favor of #3686 that centralize the Orka changes including NearForm migration. Feel free to re-open this if needed. There is also an ongoing discussion on the mail list about the new ARM nodes for Orka.