lambci / lambci

A continuous integration system built on AWS Lambda

Home Page:https://medium.com/@hichaelmart/lambci-4c3e29d6599b

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nave doesn't work after gcc has been installed

costas1379 opened this issue · comments

Hi,
The following example is a minimum verifiable example of an error I encountered yesterday where nave breaks after gcc has been installed.

lambci.json
{ "cmd": "bash ./lambdatest.sh", "build": false, "branches": { "master": false, "develop": true, "/^feature\//": true, "/^release\//": true, "/^hotfix\//": true, "/^spike\//": true } }

lambdatest.sh
#!/bin/env bash set -ex nave use 4.8.4 bash -cx 'echo test' nave use 4.8.4 bash -cx 'echo test' . ~/init/gcc nave use 4.8.4 bash -cx 'echo test'

The application has been working for more than a year now. Yesterday it started failing to use nave after GCC was complete. In this example nave works twice with version 4.8.4 to 'echo test' before the .~/init/gcc and fails after with this error message

./lambdatest.sh: line 6: 451 Segmentation fault (core dumped) nave use 4.8.4 bash -cx 'echo test'
Build #239 failed: Command "bash ./lambdatest.sh" failed with code 139

screen shot 2017-09-01 at 09 29 48

Oh no – that's very frustrating. I wonder if AWS just silently updated some core libraries in the Lambda environment that are incompatible with the (old) environment that GCC 4.8.5 was built with? That would be very frustrating.

I'm not really sure exactly what could be going here, aside from some library incompatibilities of some kind – and unfortunately I'm on vacation at the moment, so can't really look into it until next week.

Thank you very much for getting back to me. I have raised an issue with AWS and will let you know what they say.

So I've looked into this a little further – it seems that the Lambda runtime might've updated silently from being based on Amazon Linux 2016.03 to 2017.03 – and the kernel updated from v4.4 to v4.9, as well as a bunch of other system library changes.

It's rather annoying that these changes weren't documented anywhere, nor anyone given warning about them. I guess it's a risk of running on Lambda? But it's a rather annoying one.

At least that's what it seems like has happened to me – did you hear anything back?

@mhart is there anything I can do to help? what would be the steps to fix it?

Cheers!

This should be fixed now! I just rebuilt gcc-4.8.5 under the latest Lambda runtime and pushed the tarball.

You shouldn't need to change anything – your builds should just now work™. Let me know if they don't.

Basic explanation of what happened:

I built the previous version of gcc 4.8.5 and bundled with it a version of glibc 2.17 compiled from gnu.org using the --prefix of /tmp – this was necessary because the glibc-devel that comes with Amazon Linux (and hence Lambda) is hardcoded to access binaries at certain location under /usr – but we can't unpack binaries there, only /tmp.

Now glibc 2.17 is the "same" version that was on the previous Lambda – I say "same" with quotes, because Amazon actually has their own custom version of 2.17 with various patches applied. At least under Amazon Linux 2016.03 (or Linux version 4.4), the glibc 2.17 from gnu.org and the glibc 2.17 from Amazon were compatible (at least, it seems that way). Somehow, the glibc that's in the newer Amazon Linux 2017.03 (which is also "2.17", but now with even more patches applied), (or possibly it's the kernel, 4.9), is no longer compatible with the glibc 2.17 from gnu.org – at least, this is my best guess of why the segmentation faults were occurring.

This time around, instead of using the gnu.org version, I actually got the same source that Amazon used to build the glibc version that's on Lambda, and built it from scratch, changing the prefix to /tmp instead of /usr. I then rebuilt gcc 4.8.5 – and everything seems to working as before.

Now, could this break again? Possibly – but hopefully, the fact that I've used the glibc source from Amazon this time, instead of the original source from gnu.org, should mean that there's less chance of that happening – as I assume they try to keep compatibility across Amazon Linux versions.

commented

hey @mhart m I have a sneaking suspicion this may be happening again. Apologies this is an area of development I don't knowledge in but I'm receiving an error on a fresh deployment (today) that looks very similar to above: "/var/task/vendor/bin/bash: line 1: 115 Segmentation fault (core dumped) npm install"

@Datise could you give me more details about what you're executing?

commented

Man you're fast.

LambCI v0.10.0 triggered on stack "lambci"

Looking up lambci-config for projects: global, gh/fansunite/fansunite-bet-client

$ rm -rf /tmp/lambci/build
$ git clone --depth 5 https://XXXX@github.com/fansunite/fansunite-bet-client.git -b sentry /tmp/lambci/build/fansunite/fansunite-bet-client
Cloning into '/tmp/lambci/build/xxxxx'...
$ cd /tmp/lambci/build/xxxx && git checkout -qf xxxx

Build #14 started...

Build log: https://lambci-buildresults-xxx.s3.amazonaws.com/gh/xxx/fxxxxx

$ . ~/init/gcc && npm install
Installing GCC 4.8.5...
++ curl -sSL https://lambci.s3.amazonaws.com/binaries/gcc-4.8.5.tgz
++ tar -xz -C /tmp
++ set +x
GCC setup complete

/var/task/vendor/bin/bash: line 1:    87 Segmentation fault      (core dumped) npm install
Build #14 failed: Command ". ~/init/gcc && npm install" failed with code 139

I am supporting an UI that has dependencies on node bindings for the scrypt cryptology . Just trying to run npm install at this point but not a lot of feedback for me to understand whats going wrong.

Ugh, I think something may have changed with Lambda's underlying OS. Lemme look further into it and get back to you

commented

@mhart no problem thanks for the support, sounds like lambda needs to work on their comms?

Yeah, they definitely do.

I've been able to reproduce, and I think I've fixed it. You should just be able to try your build again.

(technically it will now install gcc 4.8.3 instead of 4.8.5, even though it will say "installing 4.8.5" – don't think that will really affect anyone – should be able to get 4.8.5 working again soon)

For the hairy details: Lambda had updated a number of libraries, including glibc, which in this case was responsible for the breakage, just as it was in #92 (comment)

I'm now using the gcc binaries provided directly by the Amazon repo, with some manual modification for path changes. Fingers crossed this means no more breakages

commented

You'd think since they have the bash runtime now that communication of changes on the os level would be even more relevant? Either way the build errors have changed and are now on my hands, thanks again for the support.

Have updated gcc back to 4.8.5 behind the scenes again – let me know if you have any further issues with native builds