CDK Refactoring Tools

Question

CDK Refactoring Tools

Gtofig opened this issue 4 years ago · comments

At the moment, when I move constructs around in CDK, this results in logical IDs changing and resources being replaced during deployment, even stateful ones. This causes major pain, especially if it's a production service.

I would like CDK to offer a simpler way of refactoring code without having to worry about resource replacement.

Use Case

Refactoring code is a natural part of development, including CDK development. I don't want to have to worry about breaking production due to moving constructs around.

Proposed Solution

Option 1) Work with CloudFormation team to support resource logical ID renaming without replacement (maybe use CDK metadata as a key to identify that it's the same resource)

Option 2) Make it easier to hardcode logical IDs in CDK. At the moment it only works via L1 resources, but most constructs we deal with are not L1, therefore it becomes painful to manually set logical IDs. Perhaps Construct could expose a method to hardcode prefix for logical IDs

Other

👋 I may be able to implement this feature request
⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

Elad Ben-Israel · Answer 1 · Thu May 21 2020 17:41:33 GMT+0800 (China Standard Time)

Are you familiar with renameLogicalId?

Elad Ben-Israel · Answer 2 · Thu May 21 2020 17:42:02 GMT+0800 (China Standard Time)

Transferring this to the RFC repo as this requires further discussion.

Gtofig · Answer 3 · Thu May 21 2020 17:51:50 GMT+0800 (China Standard Time)

Are you familiar with renameLogicalId?

Yes, but this only works at L1 level, so if I have a high level construct which I move around, it becomes more complicated. Perhaps you have a pattern in mind for this?

Timothy Clark · Answer 4 · Fri May 22 2020 02:04:44 GMT+0800 (China Standard Time)

It also seems like a big ask to have to drop down to renameLogicalId every time I want to move/refactor a construct around my CDK app. Also, how would this function for composed constructs? If I have a construct that I vend to multiple apps, how would renameLogicalId help me move around some of the sub-constructs within the construct and not have impact on the users of the construct?

Timothy Clark · Answer 5 · Tue Sep 01 2020 01:39:14 GMT+0800 (China Standard Time)

Any updates here? It seems untenable to not allow folks to refactor CDK code after the first deployment without causing breakages.

Rico Hermans · Answer 6 · Tue Sep 15 2020 16:43:32 GMT+0800 (China Standard Time)

I think the better solution is to allow remapping construct paths at any point in the tree.

As long as that's a hierarchical operation we should be fine.

SeongChan Lee · Answer 7 · Fri Dec 04 2020 16:20:53 GMT+0800 (China Standard Time)

I use this https://gist.github.com/foriequal0/f1f4ea279fb64836e5fb38efefa133d7 while waiting for a feature that rename logical id in a stack

Mark Roberts · Answer 8 · Fri Nov 26 2021 19:04:44 GMT+0800 (China Standard Time)

I have just refactored a DynamoDB Table from a Stack into its own Construct within the Stack. My colleagues and I have already deployed copies of the Stack before this change into our respective AWS Accounts. How do I ensure the refactor I've just checked in will cleanly deploy, without my colleagues and I all having to jump through hoops, rename things, mess with the cdk.context.json, etc.?

It would be great if CDK had a solution for such a refactor.

Andrew Fitzgerald · Answer 9 · Mon Dec 06 2021 12:50:13 GMT+0800 (China Standard Time)

I really like how this is handled in pulumi via aliasing.
There's a great blog post explaining how it works here

Gtofig · Answer 10 · Wed Jun 15 2022 21:48:34 GMT+0800 (China Standard Time)

Would CDK context be an appropriate tool to help here?

I'm thinking if we had a method like freezeLogicalId() on a Construct, which would check CDK context for existing logical IDs for this construct, and if not found, it would store current logical IDs into the context.

Evan Heaton · Answer 11 · Tue Dec 20 2022 04:48:50 GMT+0800 (China Standard Time)

I think this is pretty big gap in aws-cdk functionality that really hinders the cdk's main goal of allowing composability and higher level logic for creating cloud infrastructure. The lack of built-in support for refactoring without resource recreation implicitly influences developers to create giant flat stacks of resources to ensure logicalId stability, which ends up just looking like CFN templates.

A tighter integration with CloudFormation resource import functionality could probably close the gap - @foriequal0 may be onto something with their comment here: aws/aws-cdk#4733 (comment)

But until the cdk addresses it, this will always be a sharp edge. One possible workaround is to add an optional logicalId override value in the props of your reusable constructs. This will allow consumers of the construct to pin the logicalId if they desire, or omit it if they'd rather have the cdk automatically generate it. With the current state of things I'd actually recommend that for stateful named resources such as DynamoDb tables, s3 buckets, etc. users should always provide a logicalId override from the get-go, in order to allow for later refactors without replacement of the underlying resource. Otherwise you'll eventually end up with a bunch of random (table.node.defaultChild as CfnTable).overrideLogicalId("MyStackAlarmedTableTable5E3CHJS") all over your codebase.

A dummy example:

// alarmed-table.ts - a reusable construct that spins up a table and an alarm monitoring its usage

import { Construct } from "constructs";
import { aws_dynamodb as dynamodb, aws_cloudwatch as cloudwatch } from "aws-cdk-lib";

export interface MyAlarmedTableProps {
  tableName: string;
  tableLogicalId?: string;
}

export class MyAlarmedTable extends Construct {
  readonly table: dynamodb.Table;
  readonly usageAlarm: cloudwatch.Alarm;

  constructor(scope: Construct, id: string, props: MyAlarmedTableProps) {
    super(scope, id);

    this.table = new dynamodb.Table(this, 'Table', {
      tableName: props.tableName,
      billingMode: BillingMode.PAY_PER_REQUEST,
    });
    if (props.tableLogicalId) {
      (this.table.node.defaultChild as CfnTable).overrideLogicalId(props.tableLogicalId);
    }

    this.alarm = new cloudwatch.Alarm(this, 'UsageAlarm', {
      // metric for throttled requests, latency, or something
    });
  }
}

and its usage:

// stack.ts

import { Stack, StackProps } from "aws-cdk-lib";
import { MyAlarmedTable } from "./alarmed-table";

class MyStack extends Stack {
  constructor(scope: App, id: string, props: StackProps) {
    super(scope, id, props);

    new MyAlarmedTable(this, 'UnitOrdersTable', {
      tableName: `UnitOrders-${props.env.region}`,
      tableLogicalId: "UnitOrdersDDB",
    });
  }
}

With this approach, no matter where you refactor and move that UnitOrdersTable resource within the CDK app, it will always refer to the same logical resource.

Benjamin Leff · Answer 12 · Thu Mar 07 2024 12:21:47 GMT+0800 (China Standard Time)

Does anyone have any recent advice on how to deal with this?

I've taken great care not to use physical names, but sometimes you just can't avoid it. For example, if you're working with cognito resources like a user pool identity provider or api gateway custom domain names where CDK requires a domain_name, its just not possible to get around. The physical names of those resources are required (and it wouldn't make sense for CDK to generate values for them anyway).

I've played with rename_logical_id but I'm unclear about unintended downstream consequences and confusion between rename_logical_id, override_logical_id, and allocate_logical_id (I know they operate on different types of resources, but some best practices of employing one over the other would be appreciated). If I understand correctly, allocate_logical_id is for "prefixing" type use cases.

As an aside, dropping to L1s for every stateful construct is a bit of a pain but can be easily circumvented.

If you're not using CDK in a static setting, this sharp edge can be dangerous in production for stateful resources, especially if you're using it in to deploy infra for customers.

Would love to hear if there's advice on this. Thanks in advance!

See #455, aws/aws-cdk#22016 and aws/aws-cdk#1424, aws/aws-cdk#4733

Benjamin Leff · Answer 13 · Fri Mar 08 2024 00:19:26 GMT+0800 (China Standard Time)

For now, here is how I'm dealing with this (in Python).

First, create a small wrapper to make pylance happy and dropping to L1s more convenient.

def override_logical_id(resource: Resource | CfnResource, logical_id: str) -> None:
    if isinstance(resource, Resource):
        cfn_resource = cast(CfnResource, resource.node.default_child)
        cfn_resource.override_logical_id(logical_id)
    else:
        resource.override_logical_id(logical_id)

Then for each resource where I need to retain state (cognito, S3, etc) and I only want to delete/update/recreate resources if I explicitly change the Id myself, I do the following:

override_logical_id(cognito_user_pool, "CognitoUserPool")

Ofc, the downside here is that if I make an update to the resource in CDK, it won't get deployed.

FWIW I know this can easily be done in a more global way w/ aspects, but for now, I'd like to keep this as stupid simple as possible.

Can someone please validate this approach and let me know of any 'gotchas' I may be missing? Does this also mean I can now name these resources without fear of naming conflicts and stack rollbacks?

This is for new deployments only, not existing deployments.

Thank you.

Benjamin Leff · Answer 14 · Fri Mar 15 2024 00:32:23 GMT+0800 (China Standard Time)

For anyone who finds this, answering my own question here.

Yes, this will work. And no, in my limited experience overriding logicalIds, this does not mean you can now name these resources without fear of naming conflicts and stack rollbacks.

If you override the logicalId and provide a name (for example, bucket_name for an S3 resource), your stack will fail with the following even if you don't have an existing bucket with that name:

The stack named WsStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "<bucket name>" already exists

Unclear why, but that's what I'm seeing.

Benjamin Leff · Answer 15 · Sun Apr 21 2024 02:51:03 GMT+0800 (China Standard Time)

Apologies in advance for the length of this post.

More discovery from the front lines: Api Gateway resources need to be handled with extreme care. See below for details.

Before diving into specifics, for anyone reading this, this sharp edge in CDK is extremely sharp. How sharp?

This sharp:

In all seriousness, to the hardworking CDK engineers — you are appreciated. Nonetheless, the issue of in/unintended side effects while changing seemingly minor things in CDK really needs to be addressed and rectified.

For example, while changing an Id upstream of various Api Gateway resources (even when many downstream Ids like apigw.DomainName, r53.ARecord, etc, were locked, and the resulting affected resources were undefined by our application) caused naming conflicts and stack rollbacks (more on this below):

Base path already exists for this domain name.

Did I know things may be affected downstream? Yes, of course. But I was under the false assumption that I had locked all Ids that needed to be locked, because I was not explicitly defining any of the resources that were affected. Admittedly, this may have been a naive assumption.

After discussing with AWS Support, the following resources need to be locked in Api Gateway even if you are not defining them:

apigw.DomainName
apigw.BasePathMapping
apigw.Deployment
apigw.Stage
apigw.UsagePlan

In our case, we were not defining apigw.BasePathMapping, but because Cfn/CDK assumes that the (none) base path is a named resource, we got: base path already exists for this domain name.

A separate albeit related issue is that some of the Api Gateway methods cannot be passed an Id at all, so you have to use the resource construct itself. For example:

the add_base_path_mapping method on DomainName does not take an Id as an arg, so you have to use the BasePathMapping resource construct in order to freeze the Id.

I've also reviewed the docs for Deployment, and it appears that the above is still true, despite CDK doing some of the heavy lifting for you:

Normally, you don't need to define deployments manually. The RestApi construct manages a Deployment resource that represents the latest model. It can be accessed through restApi.latestDeployment (unless deploy: false is set when defining the RestApi).

If you manually define this resource, you will need to know that since deployments are immutable, as long as the resource's logical ID doesn't change, the deployment will represent the snapshot in time in which the resource was created. This means that if you modify the RestApi model (i.e. add methods or resources), these changes will not be reflected unless a new deployment resource is created.

All of that being said, we're now freezing as many Ids for many of the Api Gateway resources as we can because it's difficult to test 32 (2^5) permutations of the resources above, so now our code looks like this:

**# ↓ WARNING: DO NOT CHANGE ↓**
override_logical_id(
    self._api_gw_custom_domain_name,
    "ApiGatewayCustomDomainName",  **# WARNING: DO NOT CHANGE**
)
**# ↑ WARNING: DO NOT CHANGE ↑**

**# ↓ WARNING: DO NOT CHANGE ↓**
override_logical_id(
    self._api_gw_custom_domain_base_path_mapping,
    "ApiGatewayCustomDomainNameBasePathMapping",  **# WARNING: DO NOT CHANGE** 
)
**# ↑ WARNING: DO NOT CHANGE ↑**

**# ↓ WARNING: DO NOT CHANGE ↓**
override_logical_id(
    self._api_gw_deployment,
    "ApiGatewayDeployment",  **# WARNING: DO NOT CHANGE** 
)
**# ↑ WARNING: DO NOT CHANGE ↑**


. . .

Which just looks silly.

I know this issue has been open ~4 years, there have been myriad github issues created, and according to AWS support, dozens of internal support tickets created by customers about this.

My hope is that the CDK team can dedicate time to finding / picking a solution to this problem as some seemingly viable options (IMO) have been offered. But, not doing anything more than making it possible to rename Ids is (again IMO) not enough

Imagine if after CDK v1 was released, large parts of the codebase could never be changed?

I hope the intent of this post is clear — let's find a solution to this problem. It's not to point fingers or be ungrateful. I know how hard the CDK org has worked.

NOTE: if any of the above is incorrect, please let me know. Happy to edit this post as needed.