hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.

Home Page:https://registry.terraform.io/providers/hashicorp/aws

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do not try to delete lambda@edge functions with replicas

rarguelloF opened this issue · comments

Terraform fails to delete lambda@edge functions that were already replicated, as AWS just doesn't let it.

From http://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html:

When you create a trigger, Lambda replicates the function to AWS Regions and CloudFront edge locations around the globe. Note that replicas can't be edited or deleted.

Terraform Version

Terraform v0.10.6

Affected Resource(s)

  • aws_lambda_function

Expected Behavior

Terraform should not fail and should give you some kind of warning that it couldn't delete the resource because AWS just doesn't let you.

Actual Behavior

Terraform fails as it's trying to delete lambda function and AWS doesn't allow it.

Steps to Reproduce

  1. Create some configuration file with a lambda@edge function.
  2. Associate it to a cloudfront distribution and let it get replicated.
  3. Remove the lambda from your configuration.
  4. terraform apply

I'm having this problem also.

I can't reproduce a crash in Terraform v0.10.7, but I do still hit a dead-end where Terraform is unable to delete the resource:

Error applying plan:

1 error(s) occurred:

* module.xxx.aws_lambda_function.yyy (destroy): 1 error(s) occurred:

* aws_lambda_function.yyy: Error deleting Lambda Function: InvalidParameterValueException: Lambda was unable to delete arn:aws:lambda:us-east-1:123456789123:function:zzz:2 because it is a replicated function.
	status code: 400, request id: ........-....-....-....-............

It would be useful for the aws_lambda_function resource to support a retain_on_delete parameter, similar to cloudfront_distribution so that the resource can be "forgotten" from the terraform state despite the limitation that AWS APIs won't actually let you remove it.

You can delete a CF trigger, but apparently there is actually no way to delete replicated functions

Consequence: You can no longer delete the original function that was used as a CF trigger 😞

https://forums.aws.amazon.com/thread.jspa?threadID=260242&tstart=0

I reached out to my AWS contact during the Lambda@Edge preview, and he said they have no plans to support deleting replicated functions manually. But they're working on a system to automatically delete the replicated functions when you remove the trigger.

On another note, we can no longer delete the original function that was used as a CF trigger. Says "There was an error deleting your function: Lambda was unable to delete arn:aws:lambda:us-east-1:...:1 because it is a replicated function." I'd assume this is a bug in the system, since we need to be able to delete functions we created ourselves (aside from the replicated functions created automatically by CF.)

@ctd I believe the "crash" @rarguelloF describes is likely the error that is thrown. The error effectively prevents you from altering the Lambda function (e.g. renaming, deleting), as Terraform aborts any changes when failing to delete the original, replicated function.

For my project, I had to destroy and recreate the terraform.tfstate file in order to rename a function that was replicated by a CloudFront distribution. I agree with you that it would be nice for Terraform to at least support "forgetting" Lambda functions that are incapable of being deleted by the AWS API.

@ctd @dcalhoun replaced the word "crash" for "error" 👍

Support reply regarding this on Nov 27, 2017

Hello,

Thank you for contacting Amazon Web Services.

Unfortunately this is an open issue which our team are looking to support. 

Today, Lambda Master Functions do not get deleted once the association and versions 
are deleted. This is something which the team is expecting to resolve very soon, at this 
moment unfortunately there is no option for customer or AWS team to delete these functions.

If you are running into any limit errors due to these stale functions do let me know, I 
will be happy to work with our teams to help increase the limits.

We appreciate your patience while this feature is made available.

Let me know if I could be of any further help on this case.

Best regards,

Dilip S | Sydney Support Center
Amazon Web Services

Any one have an idea what very soon means to AWS? 6-8m?

If a Lambda@Edge can be used for other behaviors and/or other CloudFront distributions, it makes sense to me to put in it's own terrform project, and just refer to it, so that destroying a distribution or removing a Lambda trigger from a behavior shouldn't attempt to destroy the (reusable) Lambda function. Other than using arn, I don't see a way to associate a Lambda function with a CF behavior. A data object to refer to a lambda would be very useful here, and we could pull the latest version using variables on the data object. Also, Lambda@Edge triggers have to use specific versions, and can't use the $LATEST alias. Is there another resource we can use to allow a CloudFront distribution to be "stable" and not need a variable updated each time a new Lambda function is released.

Same problem here..

-_- ... sometimes aws is killing me ;)

January 12, 2018. I just did my level best to find a way, any way, to delete a function that has been replicated. I'm guessing that someone inside of AWS closed the bug report not understanding how serious this issue is.

I am also blocked by this issue unfortunately.

We are trying to implement automated tests using the kitchen terraform plugin and aws-spec and this issue completely blocks this due to the terraform destroy erroring if you have a lambda@edge function in your stack.

My first thought for getting around this issue was to remove the lambda@edge functionality from the module only when tests are run but I have been unable to figure out how to accomplish this. It seems impossible to do this since the lambda_function_association property can only be set using the interpolation syntax (lambda_function_association = [ "${conditional logic here}" ] ) and that doesn't appear to support passing in a list of map objects.

Is there some other way that I can conditionally set that property based on a variable value to selectively enable or disable those associations?

Since the lambda replicas will be deleted by cloudfront with eventual consistency once the associations to the cloudfront distro has been deleted is it enough to delete the associations on destroy and trust cloudfront will eventually get them cleaned up or is the aws api erroring deleting the main lambda function due to the fact that replicas exist?

Either way, until AWS fixes their issue, which seems like they aren't going to be doing anytime soon, is it possible to have the deletion be treated as a warning so it doesn't just completely block anyone trying to use it now? Even if we get a warning and have to clean up the function manually at a later date that seems the best we can hope for until AWS addresses this properly.

Thanks!

For the sake of posterity, based on the Deleting Lambda@Edge Functions and Replicas documentation stating replicas are "deleted within a few hours" I was able to...

  1. Apply Terraform config to remove the Lambda association from my CloudFront distribution.
  2. Wait for ~1 hour.
  3. Apply Terraform config to rename my Lambda function (which requires deletion of the original).
  4. Apply Terraform config to add the Lambda association to my CloudFront distribution.

So, if your use-case doesn't require everything occurring in a single Terraform apply, you should be able to delete/rename Lambda functions if you wait long enough for the replicas to be deleted after all associations to the Lambda are removed.

Annoying and limiting, but worked in my simple case.

@dcalhoun

Terraform v0.11.7 - Throws error but actually applies all the other changes for me.

TL;DR

You can also remove it from CF via the console then wait 30 minutes. You don't have to do it via TF per say.

BUT..... While that works, I don't see how I can use this approach for a production environment given that the Lambda@Edge function serves live traffic rules. If I remove it for an hour I'm creating a production system outage for an hour. Which means that if you use Lambda@Edge and you need to make a change your SOL and have to take an outage.

FYI - Amazon has the same problem with Cloud Formation templates as well.

Also, CloudFront takes 15ish minutes to deploy a change, so every part of this sucks. You could consider fixing this with an A and a B distribution which sit behind an edge distribution via a cache behavior, update the inactive one, then change the edge to point at the inactive one. But that will take 15 minutes to deploy, during which you can't reliably know if A or B is being used. So if you needed to quickly iterate, forget about it.

@dmitrye for your production use case, you could likely do the following to avoid service interruptions...

  1. Apply Terraform config to create a duplicate Lambda with any changes (e.g. rename), remove old Lambda association, and add new Lambda association.
  2. Wait ~1 hour for old Lambda replicas to be deleted.
  3. Apply Terraform config to delete old Lambda.

@dcalhoun Yes, that would avoid an outage. But then I have to remember to clean up old lambda@edge. Unused lambda's don't cost me, so theoretically you could have a weekly script/job via TF that cleans up old functions from your config.

Would be nice if TF incremented the definition name with an ID (or some other tokenizer) so that each compile generates a new function instead of updating existing one.

i too also hit the same problem today :(

Yep. This is an issue. The terraform destroy marks as failed / error because of this one lambda @ edge.

Is there a way to ignore only this Error and make the stack destroy pass ?
Error deleting Lambda Function: InvalidParameterValueException: Lambda was unable to delete arnxxx because it is a replicated function. Please see our documentation for Deleting Lambda@Edge Functions and Replicas.

Is it possible to realize the expectation of removal of replicas, for example, through a timeout function, so that the terraform itself checks the fact of deleting replicas and then hangs without error?
Or maybe I can do it by scripting in tf file?

I also ran into this issue when trying to rename a lambda@edge function (essentially replacing an existing function with a new one). The Cloudfront update would fail as terraform is unable to delete the lambda function immediately. My workaround was to delete the function from state first:

terraform state rm aws_lambda_function.my_lambda

And then run terraform apply. Terraform was then able to create the the new lambda function, and update cloudfront. Still had to manually delete the old lambda though.

3 years later and this is still a problem ... really, the fix needs to come from Amazon. If they didn't treat trying to delete a replicated function as an error, this all goes away. I put up a forum thread, please check it out if you agree https://forums.aws.amazon.com/thread.jspa?threadID=331402

I have workaround for this issue, this helped me to overcome this known issue

resource "time_sleep" "wait_30_seconds" {
depends_on = [module.lambda_at_edge] ## if you are using resource block you can change

destroy_duration = "1200s"
}

resource "null_resource" "next" {
depends_on = [time_sleep.wait_30_seconds]
}

I have workaround for this issue, this helped me to overcome this known issue

resource "time_sleep" "wait_30_seconds" {
depends_on = [module.lambda_at_edge] ## if you are using resource block you can change

destroy_duration = "1200s"
}

resource "null_resource" "next" {
depends_on = [time_sleep.wait_30_seconds]
}

thanks a lot, looks like it works
however i have a question: could you please explain how this works? from what i see you need to remove association between lambda and cloudfront, but this workaround simply adds a timeout before lambda edge destruction; so how does this piece of terraform code remove lambda from cloudfront behavior?

I have workaround for this issue, this helped me to overcome this known issue
resource "time_sleep" "wait_30_seconds" {
depends_on = [module.lambda_at_edge] ## if you are using resource block you can change
destroy_duration = "1200s"
}
resource "null_resource" "next" {
depends_on = [time_sleep.wait_30_seconds]
}

thanks a lot, looks like it works however i have a question: could you please explain how this works? from what i see you need to remove association between lambda and cloudfront, but this workaround simply adds a timeout before lambda edge destruction; so how does this piece of terraform code remove lambda from cloudfront behavior?

Hi, when you do "terraform destroy" terraform destroy the cloud front first and lambda.
and we dont need to create a code to remove association between cloudfront and lambda.
basically we lambda@edge functions can delete after 15 or 20 Minutes after cloudfront deleted.

sample code helps you to delete the lambda function after 1200 seconds.

For the sake of posterity, based on the Deleting Lambda@Edge Functions and Replicas documentation stating replicas are "deleted within a few hours" I was able to...

  1. Apply Terraform config to remove the Lambda association from my CloudFront distribution.
  2. Wait for ~1 hour.
  3. Apply Terraform config to rename my Lambda function (which requires deletion of the original).
  4. Apply Terraform config to add the Lambda association to my CloudFront distribution.

So, if your use-case doesn't require everything occurring in a single Terraform apply, you should be able to delete/rename Lambda functions if you wait long enough for the replicas to be deleted after all associations to the Lambda are removed.

Annoying and limiting, but worked in my simple case.

@dcalhoun Can you tell me how to remove the association? What does that look like?

@dcalhoun Can you tell me how to remove the association? What does that look like?

@josh803316 it has been almost 4 years since I posted this, so I struggle to recall the specific steps. 😅 I imagine it involves removing the lambda_function_association configuration argument from your Terraform config file and applying that change with Terraform CLI apply (or whatever method you use to apply Terraform configuration).

diff --git a/example.yml b/example.yml
index 4b843c1..929c01f 100644
--- a/example.yml
+++ b/example.yml
@@ -4,11 +4,5 @@ resource "aws_cloudfront_distribution" "example" {
   # lambda_function_association is also supported by default_cache_behavior
   ordered_cache_behavior {
     # ... other configuration ...
-
-    lambda_function_association {
-      event_type   = "viewer-request"
-      lambda_arn   = aws_lambda_function.example.qualified_arn
-      include_body = false
-    }
   }
 }

Hope this helps!

@dcalhoun Yes it does and it's the same thing I arrived at as well. The good news is that it does allow me to destroy all the resources except the lambda function, the bad news is the destroy still ends up failing because of the issue above (replication and lambdas not being destroyed immediately). Thanks so much for the suggestion!

Based on the age of this issue, I would say that at this point it needs to be thoroughly researched again to ensure it is still an issue. There may be an upstream component to this but in the provider we may be able to do something to improve the behavior (e.g., eating the error or trying eventual consistency approaches to wait for CF).

Based on the age of this issue, I would say that at this point it needs to be thoroughly researched again to ensure it is still an issue. There may be an upstream component to this but in the provider we may be able to do something to improve the behavior (e.g., eating the error or trying eventual consistency approaches to wait for CF).

It's still indeed an issue. The workaround above — #1721 (comment) — works, however we should look into adding native support for this in the provider itself.

Based on the age of this issue, I would say that at this point it needs to be thoroughly researched again to ensure it is still an issue. There may be an upstream component to this but in the provider we may be able to do something to improve the behavior (e.g., eating the error or trying eventual consistency approaches to wait for CF).

To add another data point - this is still an issue in Jan 2023.

I confirm this is still an issue for us.

It is still an issue for us as well.

The proposed solution for this issue is to add a skip_destroy argument to the aws_lambda_function resource as has been done for

  • aws_cloudwatch_logs_group: #26775
  • aws_imagebuilder_component: #28905
  • aws_ecs_task_definition: #22269
  • aws_lambda_layer_version: #11997

and others. If the attribute is set to true then the Lambda function is removed from Terraform state without attempting a DeleteFunction API call.

Build on #29615.

That doesn’t resolve the issue though. It merely makes it so the user now has to manually delete the lambda. The workaround with the sleep and null resource are an appropriate method to maintain automated resources.

@ewbankkit I would like to pick up your solution as a PR if no one else is working on it

@richardjennings I'll be doing the PR, likely submitted later today. Thanks though 👏.

@edwardofclt I also plan on retrying delete (up to a configurable timeout) if the error code indicates a replicated Lambda@Edge function.

This functionality has been released in v4.57.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.