cloud-custodian / cloud-custodian

Rules engine for cloud security, cost optimization, and governance, DSL in yaml for policies to query, filter, and take actions on resources

Home Page:https://cloudcustodian.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cloud Watch events execution mode with target RunInstances

davidclin opened this issue · comments

This is a follow on of #2257

I'm unable to get the following policy to stop newly launched EC2 instances that are assigned to subnets with key/value pair of Location:Internet.

The policy is executed using the Cloud Watch events mode :

 policies:
  - name: subnet-audit
    resource: ec2
    mode:
      type: cloudtrail
      role: arn:aws:iam::xxxxxxxxxxxx:role/CloudCustodianRole
      events:
        - source: ec2.amazonaws.com
          event: RunInstances
          ids: "responseElements.instanceSet.items[].instanceId"
    filters:
      - type: subnet
        key: "tag:Location"
        value: "Internet"
    actions:
      - stop

Execution run

(custodian) $ custodian run -s . public-subnet-instance-audit-lambda.yml
2018-05-02 21:19:52,473: custodian.policy:INFO Provisioning policy lambda subnet-audit
2018-05-02 21:19:52,704: custodian.lambda:INFO Publishing custodian policy lambda function custodian-subnet-audit

The lambda is successfully created and viewable from the AWS management console.

Need tips/guidance on how to troubleshoot the Lambda. This is all new to me.

Also tried having Lambda receive EC2 instance state event without success:

policies:
  - name: subnet-audit
    resource: ec2
    mode:
      type: ec2-instance-state
      role: arn:aws:iam::xxxxxxxxxxxx:role/CloudCustodianRole
      events:
        - pending
    filters:
      - type: subnet
        key: "tag:Location"
        value: "Internet"
    actions:
      - stop

I am launching the test EC2 instance from the management console and making sure I'm selecting a subnet with the tag Location:Internet.

I was able to get the Lambda working by:

(1) using ec2-instance-state
(2) adding EC2 permissions for the role (namely, to stop instances)
(3) changing the events from 'pending' to 'running'

It's not entirely clear to me why the lambda will not trigger when a new EC2 instance is launched and is in 'pending' state.

instances in pending state can't be stopped, only terminated. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-lifecycle.html

for more details.

There should have been a note in the policy lambda logs about having to implicitly filter those out for the stop action as well.

I modified the action from 'stop' to 'terminate' and it does exactly as explained in the document. 👍

My final working policy now looks like this:

policies:
  - name: subnet-audit
    resource: ec2
    description: |
      This policy runs in the ec2-instance-state mode where the Lambda receives EC2 instance state events
      and is triggered when an ec2 instance is in 'pending' state. The Lambda will then take the terminate 
      action based on the attributes of the network ec2 instances are attached. For example, subnets
      with tag 'Location' and value that matches 'Internet' will be terminated.  Note, instances in 'pending'
      state cannot be stopped. 
    mode:
      type: ec2-instance-state
      role: arn:aws:iam::xxxxxxxxxxxx:role/CloudCustodianRole
      events:
        - pending
    filters:
      - type: subnet
        key: "tag:Location"
        value: "Internet"
    actions:
      - terminate

Regarding the note in the policy lambda logs about having to implicitly filter those out for the stop action, I'm having trouble locating them.

I followed the instructions provided in https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-logs.html and get the following message when I work my way to the Monitoring tab and click "Jump to logs" for the Errors widget:

Log group not found
The log group /aws/lambda/custodian-subnet-audit could not be found. Check if it was correctly created and retry.

Is there anything I need to add to my policy to create the log group mentioned above? Or maybe an additional permission for the Lambda role to write to CloudWatch? Maybe I answered my own question. :) I'll go try that...

Appreciate the pointers and help!

For the benefit of fellow Cloud Custodian users who are getting their feet wet and following this thread as part of the troubleshooting process, I was able to view the Lambda error logs in CloudWatch by adding permissions to the Lambda role.

Namely, I added CloudWatch Log permissions for

"logs:CreateLogGroup","logs:CreateLogStream","logs:PutLogEvents"

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "ec2:*",
                "logs:CreateLogGroup",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

While I'm still unable to find anything in the logs related to having to implicitly filter out the same for the stop action, I'm at least able to see the logs of the entire workflow and see what the lambda is doing which is still helpful.

Snip from lambda logs:

[DEBUG]	2018-05-03T19:44:52.127Z	708b88c8-4f0a-11e8-b5a1-85c41233ee4e	metric:ResourceCount Count:1 policy:subnet-audit restype:ec2 scope:policy
[INFO]	2018-05-03T19:44:52.127Z	708b88c8-4f0a-11e8-b5a1-85c41233ee4e	**Invoking actions** []
[INFO]	2018-05-03T19:44:52.127Z	708b88c8-4f0a-11e8-b5a1-85c41233ee4e	policy: subnet-audit **invoking action: stop** resources: 1
[INFO]	2018-05-03T19:44:52.128Z	708b88c8-4f0a-11e8-b5a1-85c41233ee4e	**Stop** 0 of 1 instances

I think we can close this out. I have a working policy now. 👍