fujiwara / lambroll

lambroll is a minimal deployment tool for AWS Lambda.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The lambroll deploy command exits before the function code update is successful

minamijoyo opened this issue · comments

Hi, thanks for developing a useful deployment tool for lambda!

I recently noticed that the lambroll deploy command exited before the function update was successful. This might not have been a problem with small zip packages, but lambda can deploy also container images up to 10GB. In fact, when I deployed a 4GB image, after the lambroll deploy completed, the LastUpdateStatus remained InProgress and it took almost two minutes to Successful. This delay cannot be ignored and should be waited for before updating the alias.

The deploy logs are as follows:

$ lambroll deploy --keep-versions=3
2023/09/20 07:45:58 [info] lambroll v0.14.3 with function.json
2023/09/20 07:45:58 [info] starting deploy function foo
2023/09/20 07:45:58 [info] using docker image 012345678901.dkr.ecr.ap-northeast-1.amazonaws.com/foo:latest
2023/09/20 07:45:58 [info] updating function configuration
2023/09/20 07:45:58 [info] State:Active LastUpdateStatus:Successful
2023/09/20 07:45:58 [info] updated function configuration successfully
2023/09/20 07:45:58 [info] updating function code
2023/09/20 07:45:59 [info] State:Active LastUpdateStatus:InProgress
2023/09/20 07:45:59 [info] waiting for LastUpdateStatus Successful
2023/09/20 07:46:00 [info] State:Active LastUpdateStatus:InProgress
2023/09/20 07:46:00 [info] waiting for LastUpdateStatus Successful
2023/09/20 07:46:02 [info] State:Active LastUpdateStatus:InProgress
2023/09/20 07:46:02 [info] waiting for LastUpdateStatus Successful
2023/09/20 07:46:06 [info] State:Active LastUpdateStatus:Successful
2023/09/20 07:46:07 [info] updated function code successfully
2023/09/20 07:46:07 [info] deployed version 33
2023/09/20 07:46:07 [info] updating alias set current to version 33
2023/09/20 07:46:07 [info] alias updated
2023/09/20 07:46:07 [info] deleting function version: 30
2023/09/20 07:46:07 [info] except 3 latest versions are deleted
2023/09/20 07:46:07 [info] completed
$ while true ; do aws lambda get-function --function-name foo --query 'Configuration.[State, LastUpdateStatus]' | jq -c '[(now | todateiso8601)] + .'; sleep 1; done
["2023-09-20T07:45:55Z","Active","Successful"]
["2023-09-20T07:45:57Z","Active","Successful"]
["2023-09-20T07:45:58Z","Active","InProgress"]
["2023-09-20T07:46:00Z","Active","InProgress"]
["2023-09-20T07:46:02Z","Active","InProgress"]
["2023-09-20T07:46:04Z","Active","Successful"]
["2023-09-20T07:46:06Z","Active","Successful"]
["2023-09-20T07:46:08Z","Active","InProgress"]
["2023-09-20T07:46:09Z","Active","InProgress"]
["2023-09-20T07:46:11Z","Active","InProgress"]
(snip.)
["2023-09-20T07:48:04Z","Active","InProgress"]
["2023-09-20T07:48:06Z","Active","InProgress"]
["2023-09-20T07:48:08Z","Active","Successful"]
["2023-09-20T07:48:10Z","Active","Successful"]
["2023-09-20T07:48:12Z","Active","Successful"]
["2023-09-20T07:48:14Z","Active","Successful"]

Note the updated function code successfully at 07:46:07, but it's LastUpdateStatus is still InProgress.

Hi.

lambroll waits for the LastUpdateStatus equal to "Successful" at here.

lambroll/deploy.go

Lines 235 to 255 in 38b3fc1

func (app *App) waitForLastUpdateStatusSuccessful(ctx context.Context, name string) error {
retrier := retryPolicy.Start(ctx)
for retrier.Continue() {
res, err := app.lambda.GetFunction(&lambda.GetFunctionInput{
FunctionName: aws.String(name),
})
if err != nil {
log.Println("[warn] failed to get function, retrying", err)
continue
} else {
state := aws.StringValue(res.Configuration.State)
last := aws.StringValue(res.Configuration.LastUpdateStatus)
log.Printf("[info] State:%s LastUpdateStatus:%s", state, last)
if last == lambda.LastUpdateStatusSuccessful {
return nil
}
log.Printf("[info] waiting for LastUpdateStatus %s", lambda.LastUpdateStatusSuccessful)
}
}
return errors.New("max retries reached")
}

Calling lambda.GetFunction() is equivalent to aws lambda get-function.

So, I'm confused differences in results observed by lambroll and aws-cli.

I'll try to reproduce it in my environment.

Thank you for your quick reply!

The waitForLastUpdateStatusSuccessful helper is called before UpdateFunctionCodeWithContext in the updateFunctionCode.

if err := app.waitForLastUpdateStatusSuccessful(ctx, *in.FunctionName); err != nil {

I think we also need wait and confirm it after UpdateFunctionCodeWithContext (or before UpdateAlias).

@minamijoyo Oh, I missed it. I'll move waitForLastUpdateStatusSuccessful after UpdateFunctionCode.

Looking at the log, the current behavior is depending on waitForLastUpdateStatusSuccessful in updateFunctionCode is waiting for a previous UpdateFunctionConfigurationWithContext call in the updateFunctionConfiguration.

I think simply moving waitForLastUpdateStatusSuccessful in updateFunctionCode will break this implicit balance.

I'm not sure why the current implementation of updateFunctionConfiguration also wait it before UpdateFunctionConfigurationWithContext, not after. Is there any reason?

lambroll/deploy.go

Lines 187 to 194 in 38b3fc1

func (app *App) updateFunctionConfiguration(ctx context.Context, in *lambda.UpdateFunctionConfigurationInput) (*lambda.FunctionConfiguration, error) {
if err := app.waitForLastUpdateStatusSuccessful(ctx, *in.FunctionName); err != nil {
return nil, err
}
retrier := retryPolicy.Start(ctx)
for retrier.Continue() {
res, err := app.lambda.UpdateFunctionConfigurationWithContext(ctx, in)

UpdateFunction(Code|Configuration) API fails when the last update status is "InProgress". So the current code waits for "Successful" before calling these APIs.

We should wait for "Successful" before and after calling these APIs.

That makes sense. I'm happy to fix and test the implementation 👌

@fujiwara I wrote a patch #314. Please review it.