Terraform Lambda Deployment Conflict on Package Update with APIGW redeployment resource create_before_destroy

Question

Terraform Lambda Deployment Conflict on Package Update with APIGW redeployment resource create_before_destroy

Tharsan05 opened this issue 2 months ago · comments

Terraform Version

Terraform v1.8.5
hashicorp/aws v5.53.0

Terraform Configuration Files

resource "aws_lambda_function" "test_lambda" {
#   package_type  = "Image"
#   image_uri     = "123456789.dkr.ecr.eu-central-1.amazonaws.com/test_images:gghsilf234"
#   image_config {
#     command = ["image.test_lambda_handler"]
#   }

  filename         = "${var.lambda_function_name}.zip"
  function_name    = var.lambda_function_name
  description      = "test lambda"
  role             = aws_iam_role.lambda_role.arn
  handler          = "${var.lambda_function_name}.lambda_handler"
  source_code_hash = data.archive_file.lambda_source.output_base64sha256
  timeout          = 60
  runtime          = "python3.9"
  depends_on       = [aws_cloudwatch_log_group.default]
}

resource "aws_api_gateway_integration" "parent_api_integration_request" {
  rest_api_id             = aws_api_gateway_rest_api.account_rest_api.id
  resource_id             = aws_api_gateway_resource.parent_api_resources.id
  http_method             = aws_api_gateway_method.parent_api_method_request.http_method
  integration_http_method = "POST"
  type                    = "AWS_PROXY"
  uri                     = aws_lambda_function.test_lambda.invoke_arn
}

# APIGW Deployment
resource "aws_api_gateway_deployment" "api_deployment" {
  rest_api_id = aws_api_gateway_rest_api.account_rest_api.id

  depends_on = [ aws_api_gateway_integration.parent_api_integration_request ]

  triggers = {
    redeployment = sha1(jsonencode([
      aws_api_gateway_rest_api.account_rest_api.id,
    ]))
  }

  lifecycle {
    create_before_destroy = true
  }
}

Debug Output

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_lambda_function.test_lambda: Creating...
aws_lambda_function.test_lambda: Still creating... [10s elapsed]
aws_lambda_function.test_lambda: Still creating... [20s elapsed]
aws_lambda_function.test_lambda: Still creating... [30s elapsed]
aws_lambda_function.test_lambda: Still creating... [40s elapsed]
aws_lambda_function.test_lambda: Still creating... [50s elapsed]
aws_lambda_function.test_lambda: Still creating...

Expected Behavior

Shouldn't be something like this ?
For the create_before_destroy lifecycle condition at the API Gateway resource level, Terraform should not validate Lambda deployments or updates. Instead, it should prioritize the depends_on attribute. Specifically, Terraform should check if the depends_on resource is a Lambda function and if a Lambda function with the same name already exists, Terraform should update it first, even if there are changes to the invoke ARN.

Actual Behavior

I have created a Terraform configuration to deploy a REST API and link a Lambda function to the GET method. The configuration includes resources to create a Lambda function and its associated role, followed by resources to create the API Gateway REST API. Additionally, the configuration includes an aws_api_gateway_deployment resource with the lifecycle condition create_before_destroy = true.

In the initial deployment, the Lambda function's source code was provided as a ZIP file. The deployment was successful, and all resources were created as expected.

Subsequently, I updated the Lambda function's source code package from a ZIP file to a Docker image. When I attempted to run terraform apply, Terraform tried to create a new Lambda function instead of modifying the existing one. This resulted in the following error:

operation error Lambda: CreateFunction, https response error StatusCode: 409, RequestID: aaaaaaa-74c0-4bd2-a212-871a0769f3d3, ResourceConflictException: Function already exists: test-lambda

It appears that Terraform attempts to create a new Lambda function despite the function already existing; Instead update the existing Lambda function.

The issue seems to be related to the create_before_destroy lifecycle condition specified in the aws_api_gateway_deployment resource. During the initial deployment, the Lambda function was created with a specific name. Upon updating the package type (from ZIP to Docker image or vice versa), Terraform expects a change in the invoke ARN, causing it to try to create a new Lambda function instead of updating the existing one. This problem does not occur with other types of Lambda function updates (code or configuration changes) where the invoke ARN remains unchanged after the initial deployment.

Not sure this is expected behavior ?

Steps to Reproduce

terraform init
terraform apply
then uncomment the commented lines in the provided config file and comment out the code which is under lambda function
terraform apply

then you will see that terraform trying to recreate the lambda instead of modifying then will timed out with error

Additional Context

No response

References

No response

Liam Cervante · Answer 1 · Wed Jun 12 2024 21:32:03 GMT+0800 (China Standard Time)

Hi @Tharsan05, thanks for filing this.

There's a couple of things happening at once here. First, the package_type attribute is marked by the AWS provider as ForceNew. This is what is causing the lambda to be destroyed and recreated with your changes. Second, the behaviour of create_before_destroy is transitive, in that the value will propagate to dependencies of the resource. This behaviour is documented here: https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#create_before_destroy

Note that Terraform propagates and applies the create_before_destroy meta-attribute behaviour to all resource dependencies. For example, if create_before_destroy is enabled on resource A but not on resource B, but resource A is dependent on resource B, then Terraform enables create_before_destroy for resource B implicitly by default and stores it to the state file. You cannot override create_before_destroy to false on resource B because that would imply dependency cycles in the graph.

It's this combination of behaviours that is causing the events you are seeing. In isolation it seems the behaviours are both working as designed, as such I'm going to close this ticket as working-as-designed. Note, that Terraform core and it's behaviours are not aware of the underlying types of the resources. It's not possible for Terraform to look at the dependencies and modify behaviour based on the resource types that might exist.

It does seem that you are stuck in a Catch-22 situation given the requirements here. I'm sorry to say that I'm not super familiar with the expected workflow for the AWS provider. I'd recommend having a look at the Hashicorp discuss forums, or the AWS provider, to see if anyone has a recommended workaround.