nodejs / unofficial-builds

Unofficial binaries for Node.js

Home Page:https://unofficial-builds.nodejs.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Increase disk space on unofficial-builds.nodejs.org

sxa opened this issue · comments

As part of trying to add RISC-V support, I ran out of space on the server. It looks like it only has about 1Gb left, which wasn't enough for me to create a docker image, let alone build anything on it. Space use is as follows - the machine only has a 100GB disk attached:

  • 51Gb in the download directory
  • 7.7Gb in use by Docker
  • 8Gb in use by staging
  • 26Gb in use by ccache (Roughly 4½Gb per platform)

SO we're at the limit of what we can have on this machine - it just so happens that trying to add riscv64 has sent it over the edge. I've removed my attempt at a docker image to put it back ot where it was before, but we'll need to do something with this but we were probably going to hit this pretty soon regardless. Options?

  1. Get more space on the server, or provision another one with more space
  2. Don't keep so much in the download directory (prune more of the older releases?)
  3. Stop using ccache (perhaps a temporary measure) or reduce the size of it for each container
  4. Something else?

We've also got a lot of prerelease builds in download/rc (14Gb) so that might be a good candidate for pruning

@rvagg @joaocgreis @jbergstroem can we do anything to prune older things in the download/rc and staging/src directories? Do they need to be retained? We likely need a separate discussion on the main downloads and whether we wish to retain them indefinitately too.

If we want to continue with the current server we should probably consider dropping the shipping of tar.gz versions and only shipping tar.xz and seeing if anyone objects - they're just ove half the size of the gz versions so that would save us over 60% for much of the download space.
EDIT: For whatever reason, people are still using the gz in much greater numbers so that may not be such a great idea and the headers packages - perhaps as expected - are pretty much all gz downloads.

node-gyp uses the gz header files, probably because Node.js doesn't support the xz format natively.

@ljharb what about nvm ?

Yeah that's why it wasn't too much of a surprise when I saw that almost none of the header downloads were xz :-)
I've just ran nvm on one of my systems and it's pulling a tar.xz, but it looks like most other downloads are of a tar.gz so I don't think we can change that just now.

Can setup-node pull from unofficial-builds?

Ah sorry, I didn't realize this discussion was only about unofficial builds.

If we want to continue with the current server we should probably consider dropping the shipping of tar.gz versions and only shipping tar.xz and seeing if anyone objects - they're just ove half the size of the gz versions so that would save us over 60% for much of the download space.
EDIT: For whatever reason, people are still using the gz in much greater numbers so that may not be such a great idea and the headers packages - perhaps as expected - are pretty much all gz downloads.

gz is also streaming whereas xz isn't; good for certain use-cases.

@rvagg @joaocgreis @jbergstroem can we do anything to prune older things in the download/rc and staging/src directories? Do they need to be retained? We likely need a separate discussion on the main downloads and whether we wish to retain them indefinitately too.

Sorry for the quiet. I will take a look at the machine today and see what I can come up with.

I think the long term strategy of all our hosted downloads shoudl be moving forward with using s3-like storage from our sponsors, then proxy it with cloudfalre and perhaps a https -> http proxy for "unencrypted" use-cases. This will remove the strain on ours servers too, most likely.

nvm prefers xz when available, but it’s often not, so it still uses the gz.

nvm doesn’t install unofficial builds without the user extensively opting in tho, so I’d have no problem with those going xz-only.

I would, however, assume that disk space being cheap means that the OpenJS Foundation can trivially fund whatever storage space we need.

@jbergstroem Genuinely curious - in what sense is xz not good for streaming? The simple use case of curl <some tar.xz> | xz -d | tar xpf - uses CPU to decode all the time through the download occurring so I presume you're talking about something more subtle?

FYI I've now finished with my RISC-V tests on the machine but it'll need an extra 1.5Gb or so on the disk for the new docker image once #54 is merged (🤞🏻 that'll be tomorrow) but there's currently enough for that (It looks like builds require about 2Gb free to get through the process)

I moved some of the old source tarballs off the machine for now as it was impacting getting 17.9.0 out. In the short term I propose we only keep the last ten source tarballs that have been used and also consider how many of the non-current builds on each mainline we want to retain (or just remove old tar.gz files for now and keep the latest couple of each mainline with them). Would everyone be happy with that?

I moved some of the old source tarballs off the machine for now as it was impacting getting 17.9.0 out. In the short term I propose we only keep the last ten source tarballs that have been used and also consider how many of the non-current builds on each mainline we want to retain (or just remove old tar.gz files for now and keep the latest couple of each mainline with them). Would everyone be happy with that?

I don't think we need to keep the source tarballs once the build completes -- they're not copied into downloads (and don't need to be because they are the same source tarballs for the official releases). Even if we were to rerun builds for a release, https://github.com/nodejs/unofficial-builds/tree/master/recipes/fetch-source would download fresh copies of the source tarball.

@jbergstroem Genuinely curious - in what sense is xz not good for streaming? The simple use case of curl <some tar.xz> | xz -d | tar xpf - uses CPU to decode all the time through the download occurring so I presume you're talking about something more subtle?

Just revisited this; seems like most client libraries / cli's now does streaming. It didn't back in the day it was added. Thanks for following up!

I moved some of the old source tarballs off the machine for now as it was impacting getting 17.9.0 out. In the short term I propose we only keep the last ten source tarballs that have been used>

I don't think we need to keep the source tarballs once the build completes -- they're not copied into downloads (and don't need to be because they are the same source tarballs for the official releases). Even if we were to rerun builds for a release, https://github.com/nodejs/unofficial-builds/tree/master/recipes/fetch-source would download fresh copies of the source tarball.

Exactly what I was thinking - on that basis since there have bene no objections I'll look at getting a clear-up process in place for them.

Late to the party but I don't mind completely nuking /rc/ from unofficial-builds, we have no promises of longevity here (I don't think anything on that server is even backed up!). You could do it time-based if you want.

You can also nuke the current ccache directories if you want to temporarily regain some space. Perhaps ccache could be removed entirely from the build process, they're less useful for full releases than they are for test and nightly builds I think.

We can also upgrade the disk fairly easily too I think.

I wonder if throwing the binaries up on GitHub Releases on this repo makes sense as a "backup". I don't think that could completely get away from having the "dist", since I think things like NVM rely on the same layout as the regular downloads

I don't think github releases would really suffice, indeed, because nvm definitely relies on every dist-like folder having the same layout as dist.

What I'm really confused by is why there's so much difficulty consistently serving static content - in a given release, the number of files changing is very small, and the rest of the files should never need to be touched again. Effectively, /dist etc should be able to be an S3 bucket where only new files are added. What am I missing about the difficulties here?

What I'm really confused by is why there's so much difficulty consistently serving static content - in a given release, the number of files changing is very small, and the rest of the files should never need to be touched again. Effectively, /dist etc should be able to be an S3 bucket where only new files are added. What am I missing about the difficulties here?

Probably true, but I think you're talking about nodejs/nodejs.org#4495 rather the the unofficial builds

@nschonni oh sorry, you're right. i think it's somewhat related tho, since most static content serving solutions these days have effectively infinite disk space :-)

@ljharb got a friend at AWS you could intro us to so we can get sponsorship for an infinite S3 bucket?

main difficulty is in the balance between what we have gratis and what we're willing to pay for .. and paying for stuff vs chasing sponsorship is a bit of a slippery slope that leads to a lot of $$ for the foundation

there's also the technical hassle of redirecting/mounting or changing URLs for downloads, someone has to coordinate and implement all of that stuff if we come up with a good object store backend for this stuff; consider the hassle in just changing http to https for nodejs.org/dist/ - and the fact that we can't get rid of dist/ for the more preferable /downloads/*/.

Thanks, that clarifies - although I think it'd still be useful to quantify what S3 usage (or other alternatives) would actually cost OpenJS, just in case it's something budgetable.

The work involved in making a change is also a very good point, thanks.

Late to the party but I don't mind completely nuking /rc/ from unofficial-builds, we have no promises of longevity here (I don't think anything on that server is even backed up!). You could do it time-based if you want.

+1 - That would be my 'first step' - keep the last 5 or something and remove the rest. I'll look at that. What's your feeling on the "real" downloads? I'm still tempted as a minimum to remove e.g. the .tar.gz versions and leave the .xz of anything that isn't the latest in any given major release line.

You can also nuke the current ccache directories if you want to temporarily regain some space. Perhaps ccache could be removed entirely from the build process, they're less useful for full releases than they are for test and nightly builds I think.

Yeah I did consider reducing them to, say 2Gb each. I created the RISC-V one with a lower capacity than the other platforms for now. And I agree that these ones aren't as time critical as the others, but overall the real problem here is the ever-expanding set of releases.

We can also upgrade the disk fairly easily too I think.

If we can do that, even to about 250GB, that would be useful I think. Or even add another 100-150GB for for the Downloads dir.

Although saying that, github releases is a way of getting 'infinite' space, but as you suggest is a bit more of an overhead to the process that probably likely isn't worth the benefit

I'm still tempted as a minimum to remove e.g. the .tar.gz versions

I'm not super keen on this option tbh, it's like crippling existing offerings, maybe worse than just deleting them—although I think I'd prefer to try and expand storage space than nuke standard downloads at this stage. Let's go with the low hanging fruit already identified and work on expanding capacity.

Migrating to github releases, maybe in this repo, might be a good option for unofficial-builds, if we can script that process easily enough. However, the nice thing about the arrangement on unofficial-builds.nodejs.org is that it mirrors nodejs.org so you can point nvm, node-gyp and other tools to it with their environment variable host switches (iirc nvm has this option, "NVM_MIRROR..." or something? node-gyp certainly does and it's how electron users make use of it). Perhaps also there's a way to wire up a nginx redirect for individual resources, so when you go to unofficial-builds.nodejs.org/download/release/vx.y.z/ you get an index but when you ask for one of the files, it does a proxy to github to actually fetch it, so we're storing minimal on the server but have the round-trip cost to github (some cache could make it happier, and CF could help too).

Yep agree that increasing the space is the better option at the moment. Although I don't think i have access to the DO console to do anything with that (Not sure who does)

Yep agree that increasing the space is the better option at the moment. Although I don't think i have access to the DO console to do anything with that (Not sure who does)

The build-infra team does. I took a look and we have a few unattached volumes in DO. Unfortunately they're not in the same datacenter as the unofficial-builds server so could not be attached to it. I've deleted an unattached 1TB volume and recreated it at the same datacenter as the unofficial-builds server.

It's showing up as:

/dev/sda        992G   77M  942G   1% /mnt/unofficial_builds

@sxa Since you've been given access to the unofficial-builds server perhaps you can make the changes necessary to map onto the new disk? Preferably via Ansible changes to https://github.com/nodejs/build/tree/master/ansible/roles/unofficial-builds.

Realistically file system configuration would depend on the hosting provider so I'd personally be a bit reluctant to put anything specific to device names etc. into the playbooks since they wouldn't necessarily be the same if we set it up on another server. My intention was just to remount it on e.g. /home/nodejs/download after moving the current data over - I'm happy to take on that data migration work. Thanks!

I was thinking about https://github.com/nodejs/build/blob/c9210efa59d6e93f89c49e9ca3d3f2f7aa7d5ba0/ansible/roles/unofficial-builds/tasks/main.yml#L67 which creates the /home/nodejs/download directory but on reflection I guess that would no-op if it already exists.

* 51Gb in the `download` directory
* 7.7Gb in use by Docker
* 8Gb in use by `staging`
* 26Gb in use by `ccache` (Roughly 4½Gb per platform)

Current status with #58 unable to complete:

52Gb in download
8.4Gb in /var/lib/docker
4.1Gb in staging
26Gb in use by .ccache

So if anything slightly less than was in use previously.

New 1TB disk has been mounted on /home/nodejs/download/release and is live with a copy of the previous contents. The old directory is still in /home/nodejs/download-release-55 on the server, albeit with some things removed to make space for 14.9.2 - we can clear that up after a few days after confirming that the server is working properly.