Enhancing CI/CD Workflows - Integrating AWS Access Analyzer for IAM Security

This project revolves around the idea - Security + Automation

We're improvising on both efficiency plus security within deployment processes through integration of a tool --> cfn-policy-validator into a cohesive CI/CD Pipeline. 📌

What's our core intent behind this?

Automating IAM Policy Validation Tests. Yes, we wanted to make sure that IAM security is inherently a part of every deployment cycle. It'll automatically halt the build if it fails the validation tests

This means an infra-wide security and compliance 👍

Key tangibles at a glance

↳ We're scaling simplicity --> reducing the operational overhead associated by "ingraining" IAM security within every deployment cycle.

↳ Each deployment, irrespective, would conform to IAM security benchmarks --> Compliance ++ 👍

↳ Once we've reduced operational overhead/ manual intervention, deployments become faster plus more reliable

↳ We're optimizing on the costs/ time/ energy spent on post-deployment fixes, We call it "the shift-left security" (We've ingrained security early in the deployment cycle)

Why not directly use Access Analyzer APIs to scan CF templates? -- The Pain Point

Practical Challenges in Policy Validation:-

I was a part of a Cloud Security Team at one of my previous companies.

It was a really challenging task for Project teams to reduce/eliminate the use of the wildcard (*) in the resource section of IAM Policies.

This was primarily because we relied on the resource ARNs, which were available only post-deployment. 🤔 So, what's the solution?

Where exactly did Access Analyser fail?

A heads-up here...

👉 It's absolutely crucial that we have a mechanism in place capable of resolving cloud formation specific elements, --> pseudo parameters plus intrinsic functions

For the sake of completeness, clarifying on the terms we've used here :- pseudo parameters --> AWS-specific values, which have been predefined like AWS Account, Region while intrinsic functions are actually some dynamic values available during runtime --> Fn::Sub, Fn::GetAttetc.

IMP --> Access analyser does NOT parse templates or resolve any dynamic parameters in CloudFormation templates.

It's purely dependent on resource ARNs. It can analyse policies only post-deployment.

Solution --> Integrating CFN Policy Validator right into the CI/CD Pipeline

I came across this solution at AWS: ReInforce 2022. It's a conference around building super-secure, robust solutions in the Cloud 👍✅ IAM Policy Validator for CloudFormation

Attaching a short snippet from the Official Repo:-

A command line tool that takes a CloudFormation template, parses the IAM policies attached to IAM roles, users, groups, and resources then runs them through IAM Access Analyzer for basic policy validation checks and for custom policy checks.

We'll quickly run through it's functionality. It runs through the CloudFormation template, pulls the resource-based and identity-based policies, . Post "parsing"the template, it resolves the CloudFormation specific elements --> Pseudo Parameters, and intrinsic functions. Once done, it'll then analyse our template against two Access Analyser APIs, --> ValidatePolicy API and AccessPreview APIs, It'll check if it conforms with seccurity Best Practices plus detects public/ cross-account resource access. 👍

The Crux:- How does the Validator actually deal with the dynamic elements?

I had the same query before starting out 🤔

Answer:- It works by "auto-generating ARNs" for the referenced resource. I'll explain how

The ARN thus generated will have a valid ARN structure (for that particular resource type) and a randomly generated name.

The tool can do this and still provide valid IAM finding results because.... "the structure of the ARN is what's important, not the actual value of the resource name." 📌

Please Note:-

--> I would like to re-emphasize that "policy validation" is NOT dependent on the exact / actual resource ARNs. It works by "analysing the "relationship" between Resources and the Actions"

It's essential to understand when exactly are ARNs needed:-

Policy ENFORCEMENT --> Needed. Reason:- It's critical to assign correct permissions to the correct resource ✅

Policy VALIDATION --> Not needed. Since we're analysing the effect, action & condition clauses of a policies, so long as the ARNs are structured appropriately, we'll be able to assess the relationship of the resources, with the actions they're permitted to perform. And this is what we want, this relationship allows us to understand if these policies adhere to the best practices / or are exposed publicly.

If we've got a well-formatted ARN which reflects the Type of resource + Permissions attached to it, we're good to go.

It DOES offer an option of creating a template-configuration file, for passing in the parameters that could be used for

How does the workflow actually look like?

Have attached the CloudFormation template for setting up the complete CI/CD Pipeline, More about this below. Have also made decision decisions aimed at improvising performance and build times.

We'll be storing our codebase in a CodeCommit Repository.
↓
Committing our code would trigger off subsequent phases of the pipeline. We'll be utilising CodePipeline to help us orchestrate the CI?CD Process.
↓
UPDATE : There'll be a centralised dependency installation phase, wherein all the dependencies will be pre-installed --> Subsequent build phases would just need to fetch these dependencies, and we're good to go 👍
↓
Our first build stage would then validate the syntax of the CF Template using CFN-Lint. We'll have Unit tests run in parallel --> speeding up the testing process, while ensuring we're meeting quality standards
↓
If it passes the first stage, we've defined the second CodeBuild Project - to use CFN-Policy-Validator. If not, will need to fix the syntax issues and re-commit
↓
It'll parse the templates, pull out the policies plus resolve the dynamic parameters - (We've already discussed this in detail above.)
↓
It finally runs the policies through the two access analyzer APIs - ValidatePolicy and AccessPreview APIs
↓
The generated list would comprise both Blocking and Non-Blocking Findings. These would actually provide actionable insights into the policy issues we need to address. ↓
If there aren't any blocking findings, the pipeline would then deploy the CF Template 👍

How've enhanced it from a non-functional standpoint?

Key design considerations from a non-functional standpoint:-

1--> We've installed dependencies all in one go, in a dedicated CloudBuild project. and then reused them across all subsequent stages in the pipeline

What exactly was the idea behind this?

i. Number one --> We wanted to reduce on the build times, hence, we decided against having repetitive -- redundant installation of dependencies 👍

ii. If we're storing pre-installed dependencies in an artifact, the subsequent build stages would simply need to copy and reuse them --> No need to install it anew

iii. Needed some reliability across builds. Centralising dependencies means consistency, standardisation throughout and eventually reliability. That's where we're looking at.

2 --> We wanted to incorporate some parallelism into our testing suite.

I'll tell you why

A - You're making testing times shorter. Utilising multiple CPU Cores to run tests simultaneously

B - You're shortening the feedback loops --> development teams can make validation/ experimentation way quicker --> means frequent builds & deployments

C - Resources are being used optimally, There won't be under-utilisation of computational resources

D - It'll now be "adaptable" to larger workloads, or project growth, Testing won't be a bottleneck either

We decided to configure "pytest" in parallel, using pytest-xdist plugin. Cfn-lint and cfn-policy-validator would run sequentially

It wouldn't make sense to assess policies if the CloudFormation syntax itself isn't correct. --> We'll end up introducing unnecessary inconsistencies if linting isn't performed before validation

3 --> We've incorporated some dynamic scaling for managing the build lifecycles, By doing this, we'll be cutting on unnecessary costs (which we might have incurred due to over-provisioned build instances), while being resource-efficient at the same time 👍

We've first defined a launch template, It's flexible when compared to launch configurations --> we can specify varying instance types, plus they're more versatile. Went ahead with setting up an auto scaling group as well.

Plus some scaling policies that'll scale out during peak usage, and scale in low-demand periods This would be triggered through CloudWatch alarms we've created - ASG would scale-out when the CPU Utilisation Metrics cross 70 %, and scale-in when it drops below 30 %

4 --> We knew we had to implement caching for speeding up build times, We'd be caching frequently accessed data / files in the build environment itself. (This means we're reducing re-fetching/ re-computing data, storing the cache for subsequent build runs)

We made use of LOCAL_SOURCE_CACHE and LOCAL_CUSTOM_CACHE in our code configuration

So, this'll cache our Git sourcecode repo, --> We wouldn't have to reclone the repository over and over again. When changes would be pushed to the repo, it'll fetch only the delta (The changes) not re-clone the repo.

We'll be caching the dependency directory in the local_custom_cache, This means all the python packages we'll be installing via pip. By caching these frequently accessed dependencies, subsequent runs for all the build projects, would query the cache instead of re-installing them . Improvising on the build times 👍

Wrapping it up!

Thank you for accompanying me on my journey.

This project was not only geared towards creating an automated, fully-functional CI/CD Pipeine, The core intent/ idea, that drove me, was making deployments not only quicker, but also enhance largely upon the security and reliability of the deployment lifecycle. Industries, with their dynamically changing requirements, need not only need speed, but also the robustness of the solution developed. Thus, I've gon the extra mile, enhancing this design from a non-functional standpoint --> Right from ingraining IAM Security, to slashing down on build times, optimising on the caching processes to name a few

Thank you once again, would love to hear your comments. Always open to constructive criticism that'll spur growth for our open-source community always. This is Tanishka Marrott signing off!

Acknowledgement & Attributions

Thank you for exploring this project.

I hope this Readme provided some valuable insights into our approach plus our implementation strategies.

Also, would like to extend heartfelt thanks to AWS Workshop Studio, for providing a brilliant base for me to work and improvise upon

Any suggestions/ feedback geared towards improvising this project from a design standpoint would be most welcome! Plus if you've got any queries with regards to the implementation, please feel free to reach out at tanishka.marrott@gmail.com.

TanishkaMarrott / Integrating-AWS-IAM-Access-Analyzer-in-a-CI-CD-Pipeline