CDA-AWS-DVA-C02

AWS Certified Developer - Associate

Development with AWS Services

AWS FUNDAMENTALS (ref CCP-AWS-CLFC01 for more detailed)

EC2

MEMORY OPTIMIZED
COMPUTE OPTIMIZED
STORAGE OPTIMIZED
GENERAL PURPOSE

EC2 INSTANCE STORAGE

EBS
EFS
INSTANCE STORE

ROUTE 53

A – maps a hostname to IPv4
AAAA – maps a hostname to IPv6
CNAME – maps a hostname to another hostname
- The target is a domain name which must have an A or AAAA record
- Can’t create a CNAME record for the top node of a DNS namespace (Zone Apex)
- Example: you can’t create for example.com, but you can create for www.example.com
NS – Name Servers for the Hosted Zone
- Control how traffic is routed for a domain
HOSTED ZONES
CNAME VS ALIAS
ALIAS TARGETS
A type of record that you can create with Amazon Route 53 to route traffic to AWS resources such as Amazon CloudFront distributions and Amazon S3 buckets.
You cannot set an ALIAS record for an EC2 DNS name
Routing policies: Simple routing policy, Failover routing policy, Geolocation routing policy, Geoproximity routing policy, Latency routing policy, weighted (load balancing across regions) etc...

Traffic flow policy
Combining health checks

ELB + ASG

Session affinity/stickness
SNI solves the problem of loading multiple SSL certificates onto one web server (to serve multiple websites)

VPC

NACL
- Can have ALLOW and DENY rules
- Are attached at the Subnet level
- Rules only include IP addresses
SECURITY GROUPS
- A firewall that controls traffic to and from an ENI / an EC2 Instance
- Can have only ALLOW rules
- Rules include IP addresses and other security groups
VPC Flow logs
VPC peering
- VPC Peering connection is not transitive (must be established for each VPC that need to communicate with one another)
VPC endpoints
- Endpoints allow you to connect to AWS Services using a private network instead of the public www network

Elastic network interface – elastic network interface is a logical networking component in a VPC that represents a virtual network card.
Subnet – A range of IP addresses in your VPC. You can add AWS resources to a specified subnet. Use a public subnet for resources that must connect to the internet, and a private subnet for resources that don't connect to the internet.
Security group – use security groups to control access to the AWS resources in each subnet.
Access control list (ACL) – use a network ACL to provide additional security in a subnet. The default subnet ACL allows all inbound and outbound traffic.
Route table – contains a set of routes that AWS uses to direct the network traffic for your VPC. You can explicitly associate a subnet with a particular route table. By default, the subnet is associated with the main route table.
Route – each route in a route table specifies a range of IP addresses and the destination where Lambda sends the traffic for that range. The route also specifies a target, which is the gateway, network interface, or connection through which to send the traffic.
NAT gateway – An AWS Network Address Translation (NAT) service that controls access from a private VPC private subnet to the Internet.
VPC endpoints – You can use an Amazon VPC endpoint to create private connectivity to services hosted in AWS, without requiring access over the internet or through a NAT device, VPN connection, or AWS Direct Connect connection. For more information, see AWS PrivateLink and VPC endpoints.

RDS

MULTI-AZ DEPLOYMENTS
- MULTI-AZ INSTANCE DEPLOYMENTS
  - Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica
- MULTI-AZ CLUSTER DEPLOYMENTS
  - A Multi-AZ DB cluster has a writer DB instance and two reader DB instances in three separate Availability Zones in the same AWS Region.
  - semisynchronous replication, which requires acknowledgment from at least one reader DB instance in order for a change to be committed.
- CROSS REGION DEPLOYMENTS
  - when you perform a cross-Region restore of a DB snapshot, first you copy the snapshot to the desired Region. Then, you can restore the DB snapshot to a new DB instance.
- RDS PROXY

ELASTICCACHE

CACHE STRATEGIES
- READ ASIDE (LAZY LOADING)
- WRITE ASIDE
- READ THROUGH
- WRITE THROUGH
- WRITE BACK

ECS, ECR, FARGATE, DOCKER

ROLLING UPDATES

CloudFront

Improves read performance, content is cached at the edge

AWS LAMBDA FUNCTION

AWS Lambda is a compute service that lets you run code without provisioning or managing servers.
With Lambda, all you need to do is supply your code in one of the language runtimes that Lambda supports.
The Lambda service runs your function only when needed and scales automatically. You only pay for the compute time that you consume—there is no charge when your code is not running.

When to Use Lambda

File processing: Use Amazon Simple Storage Service (Amazon S3) to trigger Lambda data processing in real time after an upload.
Stream processing: Use Lambda and Amazon Kinesis to process real-time streaming data for application activity tracking, transaction order processing, clickstream analysis, data cleansing, log filtering, indexing, social media analysis, Internet of Things (IoT) device data telemetry, and metering.
Web applications: Combine Lambda with other AWS services to build powerful web applications that automatically scale up and down and run in a highly available configuration across multiple data centers.
IoT backends: Build serverless backends using Lambda to handle web, mobile, IoT, and third-party API requests.
Mobile backends: Build backends using Lambda and Amazon API Gateway to authenticate and process API requests. Use AWS Amplify to easily integrate with your iOS, Android, Web, and React Native frontends.

Pricing

Request pricing: Free tier of 1,000,000 AWS Lambda requests
Duration pricing: 400,000 GBs (400,000 SECONDS IF 1 GB RAM) of compute time
Data transfer with AWS Lambda Functions is free in the same AWS Region between the following services: S3, SNS, SQS ETC...
Data transferred “in” to and “out” of your AWS Lambda functions, from outside the region the function executed, will be charged at the Amazon EC2 data transfer rates

Language support

Node.js (JavaScript) • Python • Java (Java 8 compatible) • C# (.NET Core) • Golang • C# / Powershell • Ruby • Custom Runtime API (community supported, example Rust)
• Lambda Container Image • The container image must implement the Lambda Runtime API

Key features

Use environment variables to adjust your function's behavior without updating code.
Manage the deployment of your functions with versions
Create a container image for a Lambda function by using an AWS provided base image or an alternative base image so that you can reuse your existing container tooling

Concepts

programming model:
- The programming model defines the interface between your code and the Lambda system.
- You tell Lambda the entry point to your function by defining a handler in the function configuration
- The runtime passes in objects to the handler that contain the invocation event and the context, such as the function name and request ID.
- When the handler finishes processing the first event, the runtime sends it another.
- The function's class stays in memory, so clients and variables that are declared outside of the handler method in initialization code can be reused.
- The runtime captures logging output from your function and sends it to Amazon CloudWatch Logs.
- Lambda scales your function by running additional instances of it as demand increases, and by stopping instances as demand decreases.
Execution environment:
- The execution environment manages the resources required to run your function.
- The execution environment also provides lifecycle support for the function's runtime and any external extensions associated with your function.
- The function's runtime communicates with Lambda using the Runtime API. Ex. /runtime/invocation/next, /runtime/invocation/AwsRequestId/response, /runtime/init/error, /runtime/invocation/AwsRequestId/error
- Execution environment lifecycle
  - Init phase capped at 10 seconds
  - Invoke phase capped as configurable by the user (total execution time + extensions)
  - Shutdown phase capped at 2 seconds
- After the function and all extensions have completed, Lambda maintains the execution environment for some time in anticipation of another function invocation.
- When you write your function code, do not assume that Lambda automatically reuses the execution environment for subsequent function invocations.
Deployment packages: container images, .zip archives
Private networking: A Lambda function always runs inside a VPC owned by the Lambda service. Lambda applies network access and security rules to this VPC and Lambda maintains and monitors the VPC automatically.
Concurrency controls: reserved concurrency, provisioned concurrency
Lambda offers built-in HTTP(S) custom endpoint support through function URLs.
When you invoke a function, you can choose to invoke it synchronously or asynchronously. With synchronous invocation, you wait for the function to process the event and return a response. With asynchronous invocation, Lambda queues the event for processing and returns a response immediately.
Event source mapping:
- An event source mapping is a resource in Lambda that reads items from an Amazon Simple Queue Service (Amazon SQS) queue, an Amazon Kinesis stream, or an Amazon DynamoDB stream, and sends the items to your function in batches.
- maintain a local queue of unprocessed items and handle retries if the function returns an error or is throttled
- customize batching behavior and error handling, or to send a record of items that fail processing to a destination.
Tuning for optimal Performance:
- If your application is CPU-bound (computation heavy), increase RAM
- Timeout: default 3 seconds, maximum is 900 seconds (15 minutes)
- The execution context is a temporary runtime environment that initializes any external dependencies of your lambda code: can be reused for multiple invocations reducing latency and increasing speed of execution
- /tmp space directory: If your Lambda function needs disk space to perform operations (Max 10GB)
- To encrypt content on /tmp, you must generate KMS Data Keys
- Concurrency limit: up to 1000 concurrent executions
- Can set a “reserved concurrency” at the function level (=limit)
- Each invocation over the concurrency limit will trigger a “Throttle” • Throttle behavior: • If synchronous invocation => return ThrottleError - 429 • If asynchronous invocation => retry automatically and then go to DLQ
- Cold Start:
  
  • New instance => code is loaded and code outside the handler run (init) • If the init is large (code, dependencies, SDK…) this process can take some time. • First request served by new instances has higher latency than the rest
  - To improve resource management and performance, the Lambda service retains the execution environment for a non-deterministic period of time. During this time, if another request arrives for the same function, the service may reuse the environment.
- Provisioned concurrency:
  - Concurrency is allocated before the function is invoked (in advance)
  - So the cold start never happens and all invocations have low latency
  - Understanding invocation patterns: After the invocation has ended, the execution environment is retained for a period of time. If another request arrives, the environment is reused to handle the subsequent request.
- Layers to reuse common libraries and can be referenced in multiple published versions of the function code
- For asynchronous invocations, an internal queue exists between the caller and the Lambda service. Lambda processes messages from this queue as quickly as possible and scales up automatically as needed.
Walkthrough :
- Problem
- Solution

How does it works

Working Example Basic lambda function deploy

Basic python lambda handler
General Configuration: memory, timeout, roles/permissions etc....
Monitoring cloudwatch
Logging cloudwatch
Execution role
Versioning: A published version is a snapshot of your function code and configuration that can't be changed
Environment Variables: When you publish a version, the environment variables are locked for that version along with other. Some environment variables are reserved and set by lambda runtimes. For ex. AWS_DEFAULT_REGION, _X_AMZN_TRACE_ID etc...
Securing Environment variables
- Security at rest: Lambda always provides server-side encryption at rest with an AWS KMS key
- Security at transit: For additional security, you can enable helpers for encryption in transit, which ensures that your environment variables are encrypted client-side for protection in transit.

Invoking Lambda functions

You can invoke Lambda functions directly using the Lambda console, a function URL HTTP(S) endpoint, the Lambda API, an AWS SDK, the AWS Command Line Interface (AWS CLI), and AWS toolkits
You can also configure other AWS services to invoke your function in response to events or external requests, or on a schedule.
For another AWS service to invoke your function directly, you need to create a trigger using the Lambda console. A trigger is a resource you configure to allow another AWS service to invoke your function when certain events or conditions occur. Multiple triggers can co-exist independently and each event that Lambda passes to your function has data from only one trigger.
For your Lambda function to process items from a stream or a queue, such as an Amazon Kinesis stream or an Amazon Simple Queue Service (Amazon SQS) queue, you need to create an event source mapping. An event source mapping is a resource in Lambda that reads items from a stream or a queue and creates events containing batches of items to send to your Lambda function. Each event that your function processes can contain hundreds or thousands of items.
- Synchronous invokation
  - With synchronous invocation, you wait for the function to process the event and return a response.
  - Ex. User Invoked: • Elastic Load Balancing (Application Load Balancer) • Amazon API Gateway • Amazon CloudFront (Lambda@Edge) • Amazon S3 Batch • Service Invoked: • Amazon Cognito • AWS Step Functions • Other Services: • Amazon Lex • Amazon Alexa • Amazon Kinesis Data Firehose
  - Default response
  - The payload is a string that contains an event in JSON format. The name of the file where the AWS CLI writes the response from the function is response.json
  - If the function returns an object or error, the response is the object or error in JSON format. If the function exits without error, the response is null.
  - If Lambda was able to run the function, the status code is 200, even if the function returned an error.
- Asynchronous invokation
  • Amazon Simple Storage Service (S3) • Amazon Simple Notification Service (SNS) • Amazon CloudWatch Events / EventBridge • AWS CodeCommit (CodeCommitTrigger: new branch, new tag, new push) • AWS CodePipeline (invoke a Lambda function during the pipeline, Lambda must callback) ----- other ----- • Amazon CloudWatch Logs (log processing) • Amazon Simple Email Service • AWS CloudFormation • AWS Config • AWS IoT • AWS IoT Events
  - Lambda queues the event for processing and returns a response immediately. For asynchronous invocation, Lambda handles retries and can send invocation records to a destination.
  - Returns statusCode "202" here for success/failure
  - For asynchronous invocation, Lambda places the event in a queue and returns a success response without additional information.
  - A separate process reads events from the queue and sends them to your function
  - Lambda will perform retries with exponential backoff incase of failure of async invokation of event
  - If the function returns an error, Lambda attempts to run it two more times, with a one-minute wait between the first two attempts, and two minutes between the second and third attempts.
  - For throttling errors (429) and system errors (500-series), Lambda returns the event to the queue and attempts to run the function again for up to 6 hours. The retry interval increases exponentially from 1 second after the first attempt to a maximum of 5 minutes.
  - Events might also get deleted from queue if it becomes too late to process them.
  - Send invokation record to SNS, SQS, AWS LAMBDA, EVENTBRIDGE
  - The invocation record contains details about the request and response in JSON format. You can configure separate destinations for events that are processed successfully, and events that fail all processing attempts.
  - dead-letter queue for discarded events: For dead-letter queues, Lambda only sends the content of the event, without details about the response.
  - If Lambda can't send a record to a destination you have configured, it sends a DestinationDeliveryFailures metric to Amazon CloudWatch.
  - A dead-letter queue acts the same as an on-failure destination in that it is used when an event fails all processing attempts or expires without being processed. However, a dead-letter queue is part of a function's version-specific configuration, so it is locked in when you publish a version.
  - To reprocess events in a dead-letter queue, you can set it as an event source for your Lambda function. Alternatively, you can manually retrieve the events.
  - Choose an Amazon SQS standard queue if you expect a single entity, such as a Lambda function or CloudWatch alarm, to process the failed event.
  - Choose an Amazon SNS standard topic if you expect multiple entities to act on a failed event. For example, you can configure a topic to send events to an email address, a Lambda function, and/or an HTTP endpoint.
  - If you're using Amazon SQS as an event source, configure a dead-letter queue on the Amazon SQS queue itself and not on the Lambda function.
Invokations

Event Source Mappings

An event source mapping is a Lambda resource that reads from an event source and invokes a Lambda function i.e. cases when services can't directly invoke lambda functions.
Ex of such services: Amazon DynamoDB, Amazon Kinesis, Amazon MQ, Amazon Managed Streaming for Apache Kafka (Amazon MSK), Self-managed Apache Kafka, Amazon Simple Queue Service (Amazon SQS), Amazon DocumentDB (with MongoDB compatibility) (Amazon DocumentDB)
An event source mapping uses permissions in the function's execution role to read and manage items in the event source.
For event source mapping to be created, we need ARN of that stream resource.
Lambda event source mappings process events at least once due to the distributed nature of its pollers.
By default, an event source mapping batches records together into a single payload that Lambda sends to your function based on Either of follwing conditions: The batching window reaches its maximum value (time), The batch size is met, The payload size reaches 6 MB.
Events needs to be polled from source, Your Lambda function is invoked synchronously.
For DLQ in case of events failure, DLQ should be on SQS itself and not on lambda function since it's synchronous invocation and DLQ on Lambda only works with async invocations.
Streaming via Kineses/DynamoDB
- A Kinesis data stream is a set of shards. Each shard contains a sequence of data records. A consumer is an application that processes the data from a Kinesis data stream. You can map a Lambda function to a shared-throughput consumer (standard iterator), or to a dedicated-throughput consumer with enhanced fan-out.
SQS standard/SQS FIFO queues
- Lambda reads messages in batches and invokes your function once for each batch. When your function successfully processes a batch, Lambda deletes its messages from the queue.
- By default, Lambda polls up to 10 messages in your queue at once and sends that batch to your function. To avoid invoking the function with a small number of records, you can tell the event source to buffer records for up to 5 minutes by configuring a batch window. Before invoking the function, Lambda continues to poll messages from the SQS standard queue until the batch window expires, the invocation payload size quota is reached, or the configured maximum batch size is reached.

LAMBDA DESTINATIONS

More control other than simple DLQ for async invocations
Sucess and failure events from async invocation both can be sent to multiple targets (destinations) and discarded batch of events from eventsource mappings (sync invocation) can be sent.
Destinations provide more useful capabilities by passing additional function execution information, including code exception stack traces, to more destination services.

Event Filtering

control which records from a stream or queue Lambda sends to your function. For example, you can add a filter so that your function only processes Amazon SQS messages containing certain data parameters.
By default, you can define up to five different filters for a single event source mapping.
Your filters are logically ORed together. If a record from your event source satisfies one or more of your filters, Lambda includes the record in the next event it sends to your function.
A filter criteria (FilterCriteria) object is a structure that consists of a list of filters (Filters).
Your filter pattern can include metadata properties (fields containing information about the event that created the record. ) , data properties (fields of the record containing the data from your stream or queue) , or both.
Ex.

ERRORS SCENARIOS HANDLING FOR LAMBDA

Asynchronous invocation: You can configure a dead-letter queue on the function to capture events that weren't successfully processed.
Event source mappings: you determine the length of time between retries and destination for failed events by configuring the visibility timeout and redrive policy on the source queue.
AWS services: services decide whether to retry or to relay the error back to requester.

Recursive loop detection

If your function is invoked more than 16 times in the same chain of requests, then Lambda automatically stops the next function invocation in that request chain and notifies you.

Security Auth for Lambda endpoint Function URL

Dedicated HTTP(S) endpoint for your Lambda function • A unique URL endpoint is generated for you (never changes) • https://.lambda-url..on.aws (dual-stack IPv4 & IPv6)
Access your function URL through the public Internet only. Doesn’t support PrivateLink (Lambda functions do support).
AWS_IAM: users who need to invoke your Lambda function URL must have the lambda:InvokeFunctionUrl permission. Depending on who makes the invocation request, you may have to grant this permission using a resource-based policy.
NONE: you may want your function URL to be public. For example, you might want to serve requests made directly from a web browser. To allow public access to your function URL. However, users must still have lambda:InvokeFunctionUrl permissions in order to successfully invoke your function URL.

AWS IAM PERMISSIONS FOR LAMBDA

LAMBDA EXECUTION ROLE: Grants the Lambda function permissions to AWS services / resources
Use resource-based policies to give other accounts and AWS services permission to use your Lambda resources.

ENVIRONMENT VARIABLES

Security of environment variables
- At REST
  - Lambda always provides server-side encryption at rest with an AWS KMS key. By default, Lambda uses an AWS managed key.
- At TRANSIT
  - For additional security, you can enable helpers for encryption in transit, which ensures that your environment variables are encrypted client-side for protection in transit.
  - Environment variables to communicate with X-Ray • _X_AMZN_TRACE_ID: contains the tracing header • AWS_XRAY_CONTEXT_MISSING: by default, LOG_ERROR • AWS_XRAY_DAEMON_ADDRESS: the X-Ray Daemon IP_ADDRESS:PORT
Your AWS Lambda function can interact with AWS Secrets Manager using the Secrets Manager API or any of the AWS Software Development Kits (SDKs). You can also use the AWS Parameters and Secrets Lambda Extension to retrieve and cache AWS Secrets Manager secrets in Lambda functions without using an SDK

Lambda layers

A Lambda layer is a .zip file archive that contains supplementary code or data. Layers usually contain library dependencies, a custom runtime, or configuration files.
Lambda extracts the layer contents into the /opt directory in your function’s execution environment. This gives your function access to your layer content.
You can include up to five layers per function. Also, you can use layers only with Lambda functions deployed as a .zip file archive.
You should be able to import any library that you’ve added as a layer to the current function.

TRACEABILITY

Lambda integrates with AWS X-Ray to help you trace, debug, and optimize Lambda applications. You can use X-Ray to trace a request as it traverses resources in your application, which may include Lambda functions and other AWS services.

Few Lambda Commands

Lambda Integrations

Lambda + ALB

When the load balancer forwards the request to a target group with a Lambda function as a target, it invokes your Lambda function and passes the content of the request to the Lambda function, in JSON format.
Elastic Load Balancing invokes your Lambda function synchronously with an event that contains the request body and metadata.
If requests from a client or responses from a Lambda function contain headers with multiple values or contains the same header multiple times, or query parameters with multiple values for the same key, you can enable support for multi-value header syntax. After you enable multi-value headers, the headers and query parameters exchanged between the load balancer and the Lambda function use arrays instead of strings. If you do not enable multi-value header syntax and a header or query parameter has multiple values, the load balancer uses the last value that it receives.
Permissions

Lambda + ALB

Lambda + SQS (DLQ)

Lambda + Cloudwatch events/EventBridge

EventBridge (CloudWatch Events) helps you to respond to state changes in your AWS resources.
With EventBridge (CloudWatch Events), you can create rules that match selected events in the stream and route them to your AWS Lambda function to take action.
You can also create a Lambda function and direct AWS Lambda to invoke it on a regular schedule. You can specify a fixed rate (CRON expression).

Lambda + S3

Amazon S3 can send an event to a Lambda function when an object is created or deleted. You configure notification settings on a bucket, and grant Amazon S3 permission to invoke a function on the function's resource-based permissions policy.
Amazon S3 invokes your function asynchronously with an event that contains details about the object.

Lambda + EventSourceMappings (SQS)

LAMBDA@EDGE + CLOUDFRONT

Lambda@Edge is an extension of AWS Lambda that lets you deploy Python and Node.js functions at Amazon CloudFront edge locations. A common use case of Lambda@Edge is to use functions to customize the content that your CloudFront distribution delivers to your end users.

Lambda + VPC (Virtual private cloud)

By default, your Lambda function is launched outside your own VPC (in an AWS -owned VPC)
Lambda will create an ENI (Elastic Network Interface) in your subnets • AWSLambdaVPCAccessExecutionRole
Deploying a Lambda function in a public subnet does not give it internet access or a public IP
Deploying a Lambda function in a private subnet gives it internet access if you have a NAT Gateway / Instance
You can use VPC endpoints to privately access AWS services without a NAT
Note: Lambda - CloudWatch Logs works even without endpoint or NAT Gateway
You can connect other VPCs to the VPC with interface endpoints using VPC peering.
Traffic between peered VPCs stays on the AWS network and does not traverse the public internet. Once VPCs are peered, resources like Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Relational Database Service (Amazon RDS) instances, or VPC-enabled Lambda functions in both VPCs can access the Lambda API through interface endpoints created in the one of the VPCs.

Lambda + EFS (elastic file system)

You can configure a function to mount an Amazon Elastic File System (Amazon EFS) file system to a local directory.
For performance and resilience, use at least two Availability Zones. For example, in a simple configuration you could have a VPC with two private subnets in separate Availability Zones. The function connects to both subnets and a mount target is available in each.
You can configure a function to mount an Amazon EFS file system in another AWS account. For this to happen, it needs: VPC peering must be configured, The security group for the Amazon EFS file system you want to mount must be configured to allow inbound access from the security group associated with your Lambda function., Subnets must be created in each VPC with matching Availability Zone (AZ) IDs., DNS Hostnames must be enabled in both VPCs.

Lambda + CloudFormation

Lambda container images

Deploy Lambda function as container images of up to 10GB from ECR
Pack complex dependencies, large dependencies in a container
Base images are available for Python, Node.js, Java, .NET, Go, Ruby
Can create your own image as long as it implements the Lambda Runtime API

Lambda versions and aliases

Aliases are ”pointers” to Lambda function versions
CodeDeploy can help you automate traffic shift for Lambda aliases
Linear: grow traffic every N minutes until 100% • Linear10PercentEvery3Minutes • Linear10PercentEvery10Minutes • Canary: try X percent then 100% • Canary10Percent5Minutes • Canary10Percent30Minutes • AllAtOnce: immediate
Can create Pre & Post Traffic hooks to check the health of the Lambda function

Lambda Insights and Profiler

Gain insights into runtime performance of your Lambda functions using CodeGuru Profiler
AmazonCodeGuruProfilerAgentAccess policy to your function

TroubleShooting and Debugging Lambda

API Gateway returns a 500 error code. If the Lambda function runs but returns an error, or returns a response in the wrong format, API Gateway returns a 502 error code.
What Could Go Wrong -
- Unexpected event payloads (malformed JSON event, payload max size is 256 KB exceeded)
- Running an unintended function version or alias (unintentionally pointing the caller to wrong Alias/version, can be determined from cloudwatch logs)
- Triggering infinite loops (source triggers the lambda which again interact with source to again trigger the lambda invokation and so on ..., If you must publish data back to the consuming resource, ensure that the new data does not trigger the same event, or the Lambda function can filter events)
- Downstream unavailability (implement server timeouts and alarms to differentiate between 3rd party unavailabliity of services and lambda unavaiablility/errors)
- CPU and Memory configurations (sign- functions running slower than expected, CPU settings indirectly bounded to Memory settings)
- Memory leakage between invocations (warm starts)
- Asynchronous results returned to a later invocation (callbacks changed to async/await)
- Troubleshooting Queue processing
  - Identifying and managing throttling
  - Errors in the processing function
  - Identifying and handling backpressure (SQS monitoring shows the age of the earliest message growing linearly, along with the approximate number of messages visible.)
Metrics from cloudwatch
- Invokations, Durations, Errors, Throttles, DeadLetterQueues, IteratorAge (such as Kinesis or DynamoDB streams, this value indicates when events are being produced faster than they are being consumed by Lambda), ConcurrentExecutions, AsyncEventsReceived, AsyncEventAge, AsyncEventsDropped

AWS STEP FUNCTIONS

AWS Step Functions is a serverless orchestration service, Through Step Functions' graphical console, you see your application’s workflow as a series of event-driven steps.
Written in JSON notation
Step Functions is based on state machines and tasks.
In Step Functions, a workflow is called a state machine, which is a series of event-driven steps. Each step in a workflow is called a state.
A Task state represents a unit of work that another AWS service, such as AWS Lambda, performs.

Standard workflows

Standard workflows have exactly-once workflow execution and can run for up to one year. This means that each step in a Standard workflow will execute exactly once.

Express workflows

Express workflows, however, have at-least-once workflow execution and can run for up to five minutes. This means that one or more steps in an Express Workflow can potentially run more than once, while each step in the workflow executes at least once.

Use Cases

You create a workflow that runs a group of Lambda functions (steps) in a specific order.
Using a Choice state, you can have Step Functions make decisions based on the Choice state’s input
Retry/Catch
With a callback and a task token, you have Step Functions tell Lambda to send your customer’s money and report back when your customer’s friend receives it.
Using a Parallel state, Step Functions inputs the video file, so Lambda can process it into the five display resolutions at the same time.
Using a Map state, Step Functions has Lambda process each of your customer's items in parallel.

states

Choice State -Test for a condition to send to a branch (or default branch)
Fail or Succeed State - Stop execution with failure or success
Pass State - Simply pass its input to its output or inject some fixed data, without performing work.
Wait State - Provide a delay for a certain amount of time or until a specified time/date.
Map State - Dynamically iterate steps.
Parallel State - Begin parallel branches of execution.

Error handling

All states, except Pass and Wait states, can encounter runtime errors.
Predefined error codes:
• States.ALL : matches any error name
• States.Timeout: Task ran longer than TimeoutSeconds or no heartbeat received
• States.TaskFailed: execution failure
• States.Permissions: insufficient privileges to execute code
Instead of the application handling the error handling mechanisms, retry and catch fallbacks can be used as cross cutting concerns
Retry examples
Catch fallbacks
When a state reports an error and either there is no Retry field, or if retries fail to resolve the error, Step Functions scans through the catchers in the order listed in the array. When the error name appears in the value of a catcher's ErrorEquals field, the state machine transitions to the state named in the Next field.
ResultPath
- A path that determines what input the catcher sends to the state specified in the Next field.
- for the first catcher in the example, the catcher adds the error output to the input as a field named error-info if there isn't already a field with this name in the input
- Then, the catcher sends the entire input to RecoveryState. For the second catcher, the error output overwrites the input and the catcher only sendsthe error output to EndState.

Service Integration patterns

Each of these service integration patterns is controlled by how you create a URI in the Resource field of your task definition.
Wait for a Callback with the Task Token
- Callback tasks provide a way to pause a workflow until a task token is returned.
- For tasks like these, you can pause Step Functions until the workflow execution reaches the one year service quota, and wait for an external process or workflow to complete.
- The task will pause until it receives that task token back with a SendTaskSuccess or SendTaskFailure call.
- To avoid stuck executions you can configure a heartbeat timeout interval in your state machine definition.
- Push mechanism
Activity tasks
- Enables you to have the Task work performed by an Activity Worker
- Pull mechanism
- Activity Worker poll for a Task using GetActivityTask API
- After Activity Worker completes its work, it sends response of its success/failure using SendTaskSuccess or SendTaskFailure

AWS APPSYNC

AWS AppSync enables developers to connect their applications and services to data and events with secure, serverless and high-performing GraphQL and Pub/Sub APIs. You can do the following with AWS AppSync
Access data from one or more data sources from a single GraphQL API endpoint.
Retrieve data in real-time with WebSocket or MQTT on WebSocket
For mobile apps: local data access & data synchronization
For Security, API_KEY, AWS_IAM, OPENID_CONNECT, AMAZON_COGNITO_USER_POOLS

AWS AMPLIFY

AWS Amplify is a set of purpose-built tools and features that enables frontend web and mobile developers to quickly and easily build full-stack applications on AWS. Amplify provides two services:
- Amplify Hosting and Amplify Studio
• Integrated with Cypress testing framework • Allows you to generate UI report for your tests

AWS DYNAMODB

fully managed NoSQL database service that provides fast and predictable performance with seamless scalability
create database tables that can store and retrieve any amount of data and serve any level of request traffic.
core components:
- A table is a collection of items, and each item is a collection of attributes.
- DynamoDB uses primary keys to uniquely identify each item in a table and secondary indexes to provide more querying flexibility.
- You can use DynamoDB Streams to capture data modification events in DynamoDB tables.
- An item is a group of attributes that is uniquely identifiable among all of the other items.
- An attribute is a fundamental data element, something that does not need to be broken down any further. For example, an item in a People table contains attributes called PersonID, LastName, FirstName, and so on.
- Each item in the table has a unique identifier, or primary key, that distinguishes the item from all of the others in the table. In the People table, the primary key consists of one attribute (PersonID).
- The primary key for Music consists of two attributes (Artist and SongTitle). Each item in the table must have these two attributes. The combination of Artist and SongTitle distinguishes each item in the table from all of the others.
- Millions of requests per seconds, trillions of row, 100s of TB of storage

Partitioning

DynamoDB stores data in partitions. A partition is an allocation of storage for a table, backed by solid state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS Region.
DynamoDB stores and retrieves each item based on its partition key value.

Primary Key

Partition key:
- A simple primary key, composed of one attribute known as the partition key.
- DynamoDB uses the partition key's value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored.
- The partition key of an item is also known as its hash attribute.
Partition key and sort key:
- composite primary key, this type of key is composed of two attributes. The first attribute is the partition key (same as above to determine physical storage area internal), and the second attribute is the sort key.
- All items with the same partition key value are stored together, in sorted order by sort key value.
- In a table that has a partition key and a sort key, it's possible for multiple items to have the same partition key value. However, those items must have different sort key values.
- The sort key of an item is also known as its range attribute.

HOW TO Choose Partition key

Use high-cardinality attributes. These are attributes that have distinct values for each item, like emailid, employee_no, customerid, sessionid, orderid, and so on.
Use composite attributes. Try to combine more than one attribute to form a unique key, if that meets your access pattern. For example, consider an orders table with customerid#productid#countrycode as the partition key and order_date as the sort key, where the symbol # is used to split different field.
a randomizing strategy can greatly improve write throughput. But it’s difficult to read a specific item because you don’t know which suffix value was used when writing the item.
Isolate frequently accessed items
- If your application drives disproportionately high traffic to one or more items, adaptive capacity rebalances your partitions such that frequently accessed items don't reside on the same partition. This isolation of frequently accessed items reduces the likelihood of request throttling due to your workload exceeding the throughput quota on a single partition.

Antipatterns for partition keys

Use sequences or unique IDs generated by the DB engine as the partition key, especially when you are migrating from relational databases.
Using low-cardinality attributes like Product_SKU as the partition key and Order_Date as the sort key greatly increases the likelihood of hot partition issues.
For example, if one product is more popular, then the reads and writes for that partition key are high resulting in throttling issues.

How to Choose Sort Keys

Careful design of the sort key lets you retrieve commonly needed groups of related items using range queries with operators such as begins_with, between, >, <, and so on.

Secondary Indexes

A secondary index lets you query the data in the table using an alternate key, in addition to queries against the primary key.
- Global secondary index – An index with a partition key and sort key that can be different from those on the table. A global secondary index is considered "global" because queries on the index can span all of the data in the base table, across all partitions.
- Can be added/modified after table creation
- In the Above table, Scenario: Now suppose that you wanted to write a leaderboard application to display top scores for each game. A query that specified the key attributes (UserId and GameTitle) would be very efficient. However, if the application needed to retrieve data from GameScores based on GameTitle only, it would need to use a Scan operation, making it slow. To speed up queries on non-key attributes, you can create a global secondary index.
- you could create a global secondary index named GameTitleIndex, with a partition key of GameTitle and a sort key of TopScore. The base table's primary key attributes are always projected into an index, so the UserId attribute is also present.
- If the writes are throttled on the GSI, then the main table will be throttled!
- Attribute projection:
- Because the non-key attributes Wins and Losses are projected into the index, an application can determine the wins vs. losses ratio for any game, or for any combination of game and user ID.
- When an application writes an item to a table, DynamoDB automatically copies the correct subset of attributes to any global secondary indexes in which those attributes should appear.
- AWS account is charged for storage of the item in the base table and also for storage of attributes in any global secondary indexes on that table.
- KEYS_ONLY, INCLUDE, ALL
- A global secondary index only tracks data items where its key attributes actually exist.
- For ex. consider the following table with GSI named as GameTitleIndex
- Query is issued against this Table
- Here, GameTitle partition key to locate the index items for Meteor Blasters, DynamoDB uses the index to access all of the user IDs and top scores for this game. The results are returned, sorted in descending order because the ScanIndexForward parameter is set to false.
- Local secondary index – An index that has the same partition key as the table, but a different sort key. A local secondary index is "local" in the sense that every partition of a local secondary index is scoped to a base table partition that has the same partition key value.
- Must be defined at table creation time
- Each table in DynamoDB has a quota of 20 global secondary indexes (default quota) and 5 local secondary indexes.
- given a particular ForumName, a Query operation could immediately locate all of the threads for that forum. Within a group of items with the same partition key value, the items are sorted by sort key value. If the sort key (Subject) is also provided in the query
- Scenario: Which forum threads get the most views and replies, Which thread in a particular forum has the largest number of messages?, How many threads were posted in a particular forum within a particular time period?
- you can specify one or more local secondary indexes on non-key attributes, such as Replies or LastPostDateTime.
- Conditions for LSI
  - The partition key is the same as that of its base table.
  - The sort key consists of exactly one scalar attribute.
  - The sort key of the base table is projected into the index, where it acts as a non-key attribute.
- If you need to access just a few attributes with the lowest possible latency, consider projecting only those attributes into a local secondary index.
- If your application frequently accesses some non-key attributes, you should consider projecting those attributes into a local secondary index.
- EX. consider this LSI index named as LastPostIndex
- Following Query is issued
- Here, for ex. Replies is already projected which can be returned easily, Tags is not projected in LSI, so dynamoDB issues fetch request to get those from base table.

Filter expression usecases

Reducing response payload size (since max dynamoDB transfers 1MB in a single request)
Easier application filtering (instead of clinet side filtering, filter from server side in the Query request)
Better validation around time-to-live (TTL) expiry (since dyanmodDB removes data as per TTL even after till 48hrs)
Properly filter data using
- Partition Key
- Sparse Index
- based on the fact that dynamoDB only copies data into GSI if the key attributes are present in that item
- Apply business logic in the application to include that key attribute in the item only when some condition is satisified

DynamoDB table creation

Composite Primary Key

Single Partition/primary key
Schemaless/flexible schema
Global Indexes

Capacity Modes

read/write capacity mode controls how you are charged for read and write throughput and how you manage capacity
Secondary indexes inherit the read/write capacity mode from the base table.
On-demand
- DynamoDB on-demand offers pay-per-request pricing for read and write requests so that you pay only for what you use.
- DynamoDB instantly accommodates your workloads as they ramp up or down to any previously reached traffic level. If a workload’s traffic level hits a new peak, DynamoDB adapts rapidly to accommodate the workload.
- Scenarios: You create new tables with unknown workloads, You have unpredictable application traffic, You prefer the ease of paying for only what you use.
- Tables can be switched to on-demand mode once every 24 hours. Creating a table as on-demand also starts this 24-hour period. Tables can be returned to provisioned capacity mode at any time.
Provisioned (default, free-tier eligible)
RCU: Read Capacity Units: throughput for reads
WCU: Write Capacity Units: throughput for writes
For example, suppose that you create a provisioned table with 6 read capacity units and 6 write capacity units.
- Perform strongly consistent reads of up to 24 KB per second
- Perform eventually consistent reads of up to 48 KB
- Write up to 6 KB per second
- When DynamoDB throttles a read or write, it returns a ProvisionedThroughputExceededException to the caller
- DynamoDB uses burst capacity to accommodate reads or writes in excess of your table's throughput settings.
  With burst capacity, unexpected read or write requests can succeed where they otherwise would be throttled

Basic Operations

CRUD operations
- PutItem
  - Creates a new item or fully replace an old item (same Primary Key)
- GetItem
  - Read based on Primary key.
  - Primary Key can be HASH or HASH+RANGE
  - Eventually Consistent Read (default)
  - ProjectionExpression can be specified to retrieve only certain attributes
- UpdateItem
  - Edits an existing item’s attributes or adds a new item if it doesn’t exist
  - Can be used to implement Atomic Counters – a numeric attribute that’s unconditionally incremented
- Conditional Writes
  - Accept a write/update/delete only if conditions are met, otherwise returns an error
  - attribute_exists • attribute_not_exists • attribute_type • contains (for string) • begins_with (for string) • ProductCategory IN (:cat1, :cat2) and Price between :low and :high • size (string length)
  - Examples and caveates
    - attribute_not_exists(pk) AND attribute_not_exists(sk) : the second statement is extraneous. Recall that DynamoDB will first identify an item to compare against, then run the Condition Expression. If DynamoDB finds an item, it will have both the pk and the sk (assuming that's your primary key structure).
    - Unintended effects - it lost the unique user constraint
    - Enforcing business rules
    - Aggregates rules
- DeleteItem
  - Delete an individual item
  - Ability to perform a conditional delete
- DeleteTable
  - Delete a whole table and all its items
- Query
  - KeyConditionExpression: Partition Key value (must be = operator) – required, Sort Key value (=, <, <=, >, >=, Between, Begins with) – optional
  - FilterExpression: Additional filtering after the Query operation, Use only with non-key attributes
- Scan
  - Scan the entire table and then filter out data (inefficient)
  - For faster performance, use Parallel Scan
- BatchWriteItem
  - Up to 25 PutItem and/or DeleteItem in one call
  - Up to 16 MB of data written, up to 400 KB of data per item
  - Can’t update items (use UpdateItem)
  - UnprocessedItems for failed write operations (exponential backoff or add WCU)
- BatchGetItem
  - Return items from one or more tables
  - Up to 100 items, up to 16 MB of data
  - Items are retrieved in parallel to minimize latency
  - UnprocessedKeys for failed read operations (exponential backoff or add RCU)
Few Examples:
- To Prevent Overwrite
  - attribute_not_exists() if single primary key, attribute_not_exists(pk) AND attribute_not_exists(sk) evaluate to true or false before attempting the write operation.
- string comparison
- check element in a set
- Complex condition using logical operators

Expression attribute names and values

Expression attribute names:
- placeholder that you use in an Amazon DynamoDB expression as an alternative to an actual attribute name. An expression attribute name must begin with a pound sign (#)
- basic example
- special characters
- nested attributes
Expression attribute values:
- substitutes for the actual values that you want to compare—values that you might not know until runtime. An expression attribute value must begin with a colon (:) and be followed by one or more alphanumeric characters.
- Used in key condition expressions, condition expressions, update expressions, and filter expressions.

Eventual consistency

eventual consistency is almost always a read problem but not a write problem.
Tradeoffs between latency and consistency (durability)
Main table consistency
- As soon as one of the replicas responds with a successful write, the primary node will return a successful response to the client.
- By ensuring that a majority of the nodes have committed the write, DynamoDB is increasing the durability of the write.
- first, while a stale read is possible on your main table, it's fairly unlikely (66% chance you'll hit one of the "strongly consistent" nodes anyway.)
- Same concept applies to LSI
GSI consistency
- DynamoDB creates completely separate partition infrastructure to handle your global secondary index
- global secondary index can have a different partition key than your main table. If DynamoDB didn't reindex your data, then a query to a global secondary index would require querying every single partition in your table to answer the query.
- DynamoDB uses asynchronous replication to global secondary indexes. When a write comes in, it is not only committed to two of the three main table nodes, but it also adds a record of the operation to an internal queue.
- In the background, a service is processing that queue to update the global secondary indexes.
- They optimize heavily for write latency at the expense of consistency.
Global tables
- the replication latency to Global Tables is likely to be longer than that to global secondary indexes. Regions are significantly further apart than instances in the same datacenter, and network latency starts to dominate.
- using Global Tables introduces write-based consistency issues into your application. You can write to both regions, and writes will be replicated from one region to another.

PartiQL

a SQL-compatible query language, to select, insert, update, and delete data in Amazon DynamoDB. Using PartiQL, you can easily interact with DynamoDB tables and run ad hoc queries

Optimistic locking

Optimistic locking is a strategy to ensure that the client-side item that you are updating (or deleting) is the same as the item in Amazon DynamoDB. If you use this strategy, your database writes are protected from being overwritten by the writes of others
each item has an attribute that acts as a version number. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed.
The version number associated with the record must also be sent when clients request data.
Whenever the client modifies the data item, the version number present on the client side must be the same as the item's version number present in the table item.
If it is the same, it means that no other user has changed the record, allowing the write to go through.
However, if the version numbers are different, it's likely that another user has already updated the record, causing DynamoDB to reject your write by throwing the exception - ConditionalCheckFailedException. You can retrieve the item again (with newly updated data) and retry your update when this happens.

DAX (DynamoDB Accelerator)

DynamoDB response times can be measured in single-digit milliseconds. However, there are certain use cases that require response times in microseconds. For these use cases, DynamoDB Accelerator (DAX) delivers fast response times for accessing eventually consistent data.
For read-heavy or bursty workloads, DAX provides increased throughput and potential operational cost savings by reducing the need to overprovision read capacity units: beneficial for applications that require repeated reads for individual keys.
DAX supports server-side encryption. With encryption at rest, the data persisted by DAX on disk will be encrypted.
When an application sends a GetItem or BatchGetItem request, DAX tries to read the items directly from the item cache using the specified key values. If the items are found (cache hit), DAX returns them to the application immediately. If the items are not found (cache miss), DAX sends the request to DynamoDB. DynamoDB processes the requests using eventually consistent reads and returns the items to DAX. DAX stores them in the item cache and then returns them to the application.
The item cache has a Time to Live (TTL) setting, which is 5 minutes by default.
If you specify zero as the item cache TTL setting, items in the item cache will only be refreshed due to an LRU eviction or a "write-through" operation.
DAX also maintains a query cache to store the results from Query and Scan operations.
Every DAX cluster provides a cluster endpoint for use by your application. By accessing the cluster using its endpoint, your application does not need to know the hostnames and port numbers of individual nodes in the cluster.
Ex. dax://my-cluster.l6fzcv.dax-clusters.us-east-1.amazonaws.com
Amazon DynamoDB Accelerator (DAX) is a write-through/read-through caching service
If your application needs to write large quantities of data (such as a bulk data load), it might make sense to bypass DAX and write the data directly to DynamoDB. Such a write-around strategy reduces write latency. However, the item cache doesn't remain in sync with the data in DynamoDB.
This pattern of loading data into the cache only when the item is requested is often referred to as lazy loading. The advantage of this approach is that data that is populated in the cache has been requested and has a higher likelihood of being requested again.
The disadvantage of lazy loading is the cache miss penalty on the first read of the data, which takes more time to retrieve the data from the table instead of directly from the cache.
DAX handles cache evictions in three different ways:
- First, it uses a Time-to-Live (TTL) value that denotes the absolute period of time that an item is available in the cache.
- Second, when the cache is full, a DAX cluster uses a Least Recently Used (LRU) algorithm to decide which items to evict.
- Third, with the write-through functionality, DAX evicts older values as new values are written through DAX.

DynamoDB Streams

feature that captures data modification events in DynamoDB tables. The data about these events appear in the stream in near-real time, and in the order that the events occurred.
Each event is represented by a stream record: A new item is added to the table, An item is updated, An item is deleted from the table
You can use DynamoDB Streams together with AWS Lambda to create a trigger—code that runs automatically whenever an event of interest appears in a stream
When to choose DynamoDB streams over Kinesis data streams Properties Kinesis Data Streams for DynamoDB DynamoDB Streams Data retention Up to 1 year. 24 hours. Kinesis Client Library (KCL) support Supports KCL versions 1.X and 2.X. Supports KCL version 1.X. Number of consumers Up to 5 simultaneous consumers per shard, or up to 20 simultaneous consumers per shard with enhanced fan-out. Up to 2 simultaneous consumers per shard. Throughput quotas Unlimited. Subject to throughput quotas by DynamoDB table and AWS Region. Record delivery model Pull model over HTTP using GetRecords and with enhanced fan-out, Kinesis Data Streams pushes the records over HTTP/2 by using SubscribeToShard. Pull model over HTTP using GetRecords. Ordering of records The timestamp attribute on each stream record can be used to identify the actual order in which changes occurred in the DynamoDB table. For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item. Duplicate records Duplicate records might occasionally appear in the stream. No duplicate records appear in the stream. Stream processing options Process stream records using AWS Lambda, Kinesis Data Analytics, Kinesis data firehose , or AWS Glue streaming ETL. Process stream records using AWS Lambda or DynamoDB Streams Kinesis adapter. Durability level Availability zones to provide automatic failover without interruption. Availability zones to provide automatic failover without interruption.
Specifies the information that will be written to the stream whenever data in the table is modified:
- KEYS_ONLY — Only the key attributes of the modified item.
- NEW_IMAGE — The entire item, as it appears after it was modified.
- OLD_IMAGE — The entire item, as it appeared before it was modified.
- NEW_AND_OLD_IMAGES — Both the new and the old images of the item.
To read and process a stream, your application must connect to a DynamoDB Streams endpoint and issue API requests.
Stream records are organized into groups, or shards. Each shard acts as a container for multiple stream records, and contains information required for accessing and iterating through these records. The stream records within a shard are removed automatically after 24 hours.
Combining DynamoDB Time to Live (TTL), DynamoDB Streams, and AWS Lambda can help simplify archiving data, reduce DynamoDB storage costs, and reduce code complexity. Using Lambda as the stream consumer provides many advantages, most notably the cost reduction compared to other consumers such as Kinesis Client Library (KCL). You aren’t charged for GetRecords API calls on your DynamoDB stream when using Lambda to consume events, and Lambda can provide event filtering by identifying JSON patterns in a stream event.
The AWS Lambda service polls the stream for new records four times per second. When new stream records are available, your Lambda function is synchronously invoked
Records are not retroactively populated in a stream after enabling it
Adapter design pattern for kinesis with DYNAMODB
Integration Architechture

DynamoDB TTL (TIME TO LIVE)

Time to Live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed.
Doesn’t consume any WCUs (i.e., no extra cost) EXCEPT when the deletion is replicated to additional Regions
TTL is useful if you store items that lose relevance after a specific time.
you must identify a specific attribute name that the service will look for when determining if an item is eligible for expiration. After you enable TTL on a table, a per-partition scanner background process automatically and continuously evaluates the expiry status of items in the table.
A second background process scans for expired items and deletes them. Both processes take place automatically in the background, do not affect read or write traffic to the table, and do not have a monetary cost.
Items are removed from any local secondary index and global secondary index in the same way as a DeleteItem operation. This operation comes at no extra cost.
A delete operation for each item enters the DynamoDB Stream, but is tagged as a system delete and not a regular delete
Items that have expired, but haven’t yet been deleted by TTL, still appear in reads, queries, and scans. If you do not want expired items in the result set, you must filter them out (CLIENT-SIDE)
The TTL mechanism will work on items that have been inserted after the TTL has been enabled on the table
DynamoDB is removing items with expired TTL in up to 48 hours from the original expiration time
Archiving TTL deletes to S3 is a common use-case to offload cold data, reduce table storage and maintain most current data in the table.

DynamoDB transactions

transactions simplify the developer experience of making coordinated, all-or-nothing changes to multiple items both within and across tables.
Transactions provide atomicity, consistency, isolation, and durability (ACID) in DynamoDB, helping you to maintain data correctness in your applications.
DynamoDB performs two underlying reads or writes of every item in the transaction: one to prepare the transaction and one to commit the transaction.
TransactWriteItems
- is a synchronous and idempotent write operation that groups up to 100 write actions in a single all-or-nothing operation. These actions can target up to 100 distinct items in one or more DynamoDB tables
- You can optionally include a client token when you make a TransactWriteItems call to ensure that the request is idempotent.
TransactGetItems
- is a synchronous read operation that groups up to 100 Get actions together. These actions can target up to 100 distinct items in one or more DynamoDB tables
- actions are performed atomically so that either all of them succeed or all of them fail:

DynamoDB CLI examples

Dynamo DB as session state

DynamoDB write sharding

Usecases where we want to avoid hot-key partitioning for ex. all the partition keys are same (in GSI etc.. or most of the traffic is biased towards some particular key)
we would add random suffix/calculated suffix (so that it can be recomputed while querying) to partition the data evenly and avoid hot key partition and shard on the new generated keys
For ex.
Converted to
Scatter-Gather pattern to fetch aggregate queries

DynamoDB with S3 integration

DYNAMODB WITH LAMBDA

DYNAMODB FAN-OUT PATTERN

DynamoDB security

Identity based policies
- To implement this kind of fine-grained access control, you write an IAM permissions policy that specifies conditions for accessing security credentials and the associated permissions. You then apply the policy to users, groups, or roles that you create using the IAM console. Your IAM policy can restrict access to individual items in a table, access to the attributes in those items, or both at the same time.
- Ex. of AWS managed policies: AmazonDynamoDBReadOnlyAccess, AmazonDynamoDBFullAccess
- Ex. Grant all permissions on dynamodb table
- For ex. DAX cluster IAM policy
- Note: DAX does not enforce user-level separation on data in DynamoDB. Instead, users inherit the permissions of the DAX cluster's IAM policy when they access that cluster. Thus, when accessing DynamoDB tables via DAX, the only access controls that are in effect are the permissions in the DAX cluster's IAM policy.
- In addition to controlling access to DynamoDB API actions, you can also control access to individual data items and attributes
  - Using conditions in IAM policies
  - Grant permissions that limit access to items with a specific partition key value
  - Grant permissions to query only projected attributes in an index
For higher level of scaling like gaming APPS etc..., where users already have 3rd party OAuth accounts like google, facebook etc, we can leverage it as it's difficult to scale IAM policies individually

Monitoring

Cloudwatch and Cloudtrail integrated
For DAX: FaultRequestCount (internal server error 500), ErrorRequestCount (client side 400), item cache miss (ItemCacheMisses), query/scan cache miss (ScanCacheMisses, QueryCacheMisses)
For DynamoDB: TimeToLiveDeletedItemCount, ThrottledRequests, ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits

Troubleshooting and Debugging

Latency issues
- SuccessfulRequestLatency: measures latency which is internal to the DynamoDB service - client side activity and network trip times are not included.
- Use eventually consistent if strong consistency not required
- Use DAX caching for high read intensive workloads
- Reuse existing DynamoDB endpoint connections instead of creating new ones (keep-alive for 30 seconds)
Throttling
- ProvisionedThroughputExceededException exceptions
- retry throttled requests using exponential backoff.
- switching to on-demand mode
- Amazon CloudWatch Contributor Insights to find hot partitions and avoid issues

AWS S3

Amazon S3 allows people to store objects (files) in “buckets” (directories)
Buckets must have a globally unique name (across all regions all accounts)
Buckets are defined at the region level
The key is the FULL path: • s3://my-bucket/my_file.txt • s3://my-bucket/my_folder1/another_folder/my_file.txt • The key is composed of prefix + object name • s3://my-bucket/my_folder1/another_folder/my_file.txt
Max. Object Size is 5TB (5000GB), If uploading more than 5GB, must use “multi-part upload”
Buckets
- A bucket is a container for objects stored in Amazon S3. You can store any number of objects in a bucket and can have up to 100 buckets in your account.
Objects
- Objects are the fundamental entities stored in Amazon S3.
- The metadata is a set of name-value pairs that describe the object. These pairs include some default metadata, such as the date last modified, and standard HTTP metadata, such as Content-Type. You can also specify custom metadata at the time that the object is stored.
- An object is uniquely identified within a bucket by a key (name) and a version ID (if S3 Versioning is enabled on the bucket)
- For example, in the URL https://DOC-EXAMPLE-BUCKET.s3.us-west-2.amazonaws.com/photos/puppy.jpg, DOC-EXAMPLE-BUCKET is the name of the bucket and photos/puppy.jpg is the key.

Security: Bucket policies

you can secure access to objects in your buckets, so that only users with the appropriate permissions can access them
How Authorizor works:
- Converts all the relevant access policies (user policy, bucket policy, ACLs) at run time into a set of policies for evaluation.
- Amazon S3 evaluates a subset of policies in a specific context, based on the context authority: User context, Bucket context, Object context
Request Authorization
Bucket operation requested by bucket owner
Bucket operation requested by an AWS account that is not the bucket owner
Bucket operation requested by an IAM principal whose parent AWS account is also the bucket owner
Bucket operation requested by an IAM principal whose parent AWS account is not the bucket owner
Object operation request
Ex. JSON policies
User-Based:
- IAM Policies – which API calls should be allowed for a specific user from IAM
- Ex.
- Allowing an IAM user access to one of your buckets
- Allowing each IAM user access to a folder in a bucket
- Restricting access to Amazon S3 buckets within a specific AWS account
- EX. OF AWS MANAGED POLICIES: AmazonS3FullAccess, AmazonS3ReadOnlyAccess, AmazonS3ObjectLambdaExecutionRolePolicy
Resource-Based
- Bucket Policies – bucket wide rules from the S3 console - allows cross account
- The policy allows Dave, a user in account Account-ID, s3:GetObject, s3:GetBucketLocation, and s3:ListBucket Amazon S3 permissions on the awsexamplebucket1 bucket.
- Resources
- Ex. arn:partition:service:region:namespace:relative-id
- Principals
  - "AWS":"account-ARN"
- Actions
  - Object operations
  - Bucket operations
  - Explicit deny
  - Account operations
  - Condition keys
  - Objects upload requiring SSE (SERVER SIDE ENCRYPTION)
  - Granting access to specific version of object
  - Allow access on the basis of tags
- Object Access Control List (ACL) – finer grain (can be disabled)
- ACLs enabled
  - Bucket owner preferred – The bucket owner owns and has full control over new objects that other accounts write to the bucket with the bucket-owner-full-control canned ACL.
  - Object writer – The AWS account that uploads an object owns the object, has full control over it, and can grant other users access to it through ACLs.
- Bucket Access Control List (ACL) – less common (can be disabled)
- ACL permissions are mapped to policy permissions for buckets and objects
- Supports canned ACL's: predefined set of grantees and permissions. Ex. private, public-read, public-read-write, bucket-owner-read etc...

S3 BLOCK PUBLIC ACCESS

BlockPublicAcls: PUT Bucket acl and PUT Object acl calls fail if the specified access control list (ACL) is public.
IgnorePublicAcls: Setting this option to TRUE causes Amazon S3 to ignore all public ACLs on a bucket and any objects that it contains. This setting enables you to safely block public access granted by ACLs while still allowing PUT Object calls that include a public ACL
BlockPublicPolicy: Setting this option to TRUE for a bucket causes Amazon S3 to reject calls to PUT Bucket policy if the specified bucket policy allows public access.
RestrictPublicBuckets: Setting this option to TRUE restricts access to an access point or bucket with a public policy to only AWS service principals and authorized users within the bucket owner's account and access point owner's account.

Static website hosting

S3 can host static websites and have them accessible on the Internet
the website is available at the AWS Region-specific website endpoint of the bucket: http://bucket-name.s3-website-Region.amazonaws.com or http://bucket-name.s3-website.Region.amazonaws.com (returns the default index.html/configured)
need public read access permissions
if not want to grant public read permissions and still host the website, Amazon CloudFront distribution to serve your static website.

S3 versioning

means of keeping multiple variants of an object in the same bucket
When you enable versioning in a bucket, all new objects are versioned and given a unique version ID.
Objects that are stored in your bucket before you set the versioning state have a version ID of null
If you delete an object, instead of removing the object permanently, Amazon S3 inserts a delete marker, which becomes the current object version.

S3 REPLICATION

automatic, asynchronous copying of objects across Amazon S3 buckets.
You can replicate objects to a single destination bucket or to multiple destination buckets.
To automatically replicate new objects as they are written to the bucket, use live replication, such as Cross-Region Replication (CRR). To replicate existing objects to a different bucket on demand, use S3 Batch Replication.
Pre-requisites:
- Both source and destination buckets must have versioning enabled
- permissions to replicate objects from the source bucket to the destination bucket or buckets on your behalf.
Cross-Region Replication (CRR)
Same-Region Replication (SRR)
For DELETE operations
• Can replicate delete markers from source to target (optional setting)
• Deletions with a version ID are not replicated (to avoid malicious deletes)
There is no “chaining” of replication
• If bucket 1 has replication into bucket 2, which has replication into bucket 3
• Then objects created in bucket 1 are not replicated to bucket 3

S3 Storage classes, Tiers and Lifecycle management

High durability (99.999999999%, 11 9’s) of objects across multiple AZ • If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years
Availibility: • Example: S3 standard has 99.99% availability = not available 53 minutes a year
Storage classes for frequently accessed objects
- S3 Standard – The default storage class. If you don't specify the storage class when you upload an object, Amazon S3 assigns the S3 Standard storage class.
Storage classes for infrequently accessed objects
- S3 Standard-IA - Amazon S3 stores the object data redundantly across multiple geographically separated Availability Zones (minimum storage duration period of 30 days)
- S3 One Zone-IA - Amazon S3 stores the object data in only one Availability Zone, which makes it less expensive than S3 Standard-IA (minimum storage duration period of 30 days)
Storage classes for archiving objects
- S3 Glacier Instant Retrieval - Use for archiving data that is rarely accessed and requires milliseconds retrieval.
- S3 Glacier Flexible Retrieval - Use for archives where portions of the data might need to be retrieved in minutes. (1-5 minutes), minimum storage duration period of 90 days
- S3 Glacier Deep Archive - Use for archiving data that rarely needs to be accessed. has a minimum storage duration period of 180 days and a default retrieval time of 12 hours.
S3 Intelligent-Tiering automatically optimizing data with changing or unknown access patterns
- Frequent Access (automatic), Infrequent Access (automatic, object is not accessed for 30 consecutive days), Archive Instant Access (automatic, object is not accessed for 90 consecutive days), Archive Access (optional, 90 - 730 days ), Deep Archive Access (optional, 180-730 days)
Retrieval options for archived data
- Expedited: available within 1–5 minutes, Intelligent-Tiering Archive Access tier
- Standard: 3–5 hours for objects, Glacier Flexible Retrieval storage class or S3 Intelligent-Tiering Archive Access tier
- Bulk: finish within 5–12 hours for objects that are stored in the S3 Glacier Flexible Retrieval storage class or S3 Intelligent-Tiering Archive Access tier. typically finish within 48 hours for objects stored in the S3 Glacier Deep Archive storage class or S3 Intelligent-Tiering Deep Archive Access tier.
S3 lifecycle actions
- An S3 Lifecycle configuration is an XML file that consists of a set of rules with predefined actions that you want Amazon S3 to perform on objects during their lifetime.
- Amazon S3 supports a waterfall model for transitioning between storage classes
- Transition actions:
  - These actions define when objects transition to another storage class.
- Expiration actions:
  - These actions define when objects expire. Amazon S3 deletes expired objects on your behalf.
Amazon S3 Analytics – Storage Class Analysis: Help you decide when to transition objects to the right storage class
S3 EVENT NOTIFICATIONS
- receive notifications when certain events happen in your S3 bucket.
- SQS, SNS, LAMBDA (NEEDS IAM PERMISSIONS ATTACHED TO S3 BUCKETS)
- EVENTBRIDGE (Amazon S3 does not require any additional permissions to deliver events to Amazon EventBridge)

S3 PERFORMANCE OPTIMIZATION

Your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket.
Multi-Part upload: recommended for files > 100MB, must use for files > 5GB
S3 Transfer Acceleration: Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
Byte-Range Fetches: Parallelize GETs by requesting specific byte ranges
S3 SELECT: you can use structured query language (SQL) statements to filter the contents of an Amazon S3 object and retrieve only the subset of data that you need, works on objects stored in CSV, JSON, or Apache Parquet format. works with Glacier as well, so Glacier SELECT.

Metadata and tags

S3 User-Defined Object Metadata: Name-value (key-value) pairs, User-defined metadata names must begin with "x-amz-meta-”
S3 Object Tags: Useful for fine-grained permissions (only access specific objects with specific tags), can be used in s3 analytics
- In bucket lifecycle configuration, you can specify a filter to select a subset of objects to which the rule applies. You can specify a filter based on the key name prefixes, object tags, or both.
You cannot search the object metadata or object tags. Instead, you must use an external DB as a search index such as DynamoDB

S3 ENCRYPTION

SSE (SERVER SIDE ENCRYPTION)
- SSE-S3
  - Encryption type is AES-256
  - Must set header "x-amz-server-side-encryption": "AES256"
  - automatically applied to new objects stored in S3 bucket
  - Bucket policy to force to require encryption
- SSE-KMS
  - Must set header "x-amz-server-side-encryption": "aws:kms"
  - Encryption using keys handled and managed by AWS KMS (Key Management Service)
  - When you upload, it calls the GenerateDataKey KMS API
  - When you download, it calls the Decrypt KMS API
  - you can configure your buckets to use S3 Bucket Keys for SSE-KMS. Using a bucket-level key for SSE-KMS can reduce your AWS KMS request costs by up to 99 percent by decreasing the request traffic from Amazon S3 to AWS KMS.
  - AWS generates a short-lived bucket-level key from AWS KMS then temporarily keeps it in S3. This bucket-level key will create data keys for new objects during its lifecycle. S3 Bucket Keys are used for a limited time period within Amazon S3, reducing the need for S3 to make requests to AWS KMS to complete encryption operations
- SSE-C
  - keys fully managed by the customer outside of AWS
  - HTTPS must be used
  - Encryption key must provided in HTTP headers, for every HTTP request made
CSE (CLIENT SIDE ENCRYPTION)
- Use client libraries such as Amazon S3 Client-Side Encryption Library

S3 CORS

The request's Origin header must match an AllowedOrigin element.
The request method (for example, GET or PUT) or the Access-Control-Request-Method header in case of a preflight OPTIONS request must be one of the AllowedMethod elements.
Every header listed in the request's Access-Control-Request-Headers header on the preflight request must match an AllowedHeader element.

S3 MFA DELETE

MFA (Multi-Factor Authentication) – force users to generate a code on a device (usually a mobile phone or hardware) before doing important operations on S3
Permanently delete an object version, Suspend Versioning on the bucket
To use MFA Delete, Versioning must be enabled on the bucket
Only the bucket owner (root account) can enable/disable MFA Delete

S3 PRE-SIGNED URL'S

You can use presigned URLs to grant time-limited access to objects in Amazon S3 without updating your bucket policy.
The credentials used by the presigned URL are those of the AWS user who generated the URL.
You can also use presigned URLs to allow someone to upload a specific object to your Amazon S3 bucket. This allows an upload without requiring another party to have AWS security credentials or permissions.
A presigned URL remains valid for the period of time specified when the URL is generated. If you create a presigned URL with the Amazon S3 console, the expiration time can be set between 1 minute and 12 hours. If you use the AWS CLI or AWS SDKs, the expiration time can be set as high as 7 days.

S3 ACCESS POINTS

Access points are named network endpoints that are attached to buckets that you can use to perform S3 object operations, such as GetObject and PutObject.
You can delegate access control for a bucket to the bucket's access points.

S3 OBJECT LAMBDA

Use AWS Lambda Functions to change the object before it is retrieved by the caller application

API GATEWAY

AWS service for creating, publishing, maintaining, monitoring, and securing REST, HTTP, and WebSocket APIs at any scale
Create a single interface for all the microservices in your company
API Gateway creates RESTful APIs that:
- Are HTTP-based.
- Enable stateless client-server communication.
- Implement standard HTTP methods such as GET, POST, PUT, PATCH, and DELETE.
- For example, /incomes could be the path of a resource representing the income of the app user.
- In API Gateway REST APIs, the frontend is encapsulated by method requests and method responses. The API interfaces with the backend by means of integration requests and integration responses.
- API Gateway enables you to define a schema or model for the payload to facilitate setting up the body mapping template.
API Gateway to create HTTP APIs
- HTTP APIs to send requests to AWS Lambda functions or to any publicly routable HTTP endpoint.
- you can create an HTTP API that integrates with a Lambda function on the backend. When a client calls your API, API Gateway sends the request to the Lambda function and returns the function's response to the client.
- HTTP APIs support OpenID Connect and OAuth 2.0 authorization
API Gateway creates WebSocket APIs that:
- Adhere to the WebSocket protocol, which enables stateful, full-duplex communication between client and server.
- Route incoming messages based on message content.
Together with AWS Lambda, API Gateway forms the app-facing part of the AWS serverless infrastructure.
Backend servers can easily push data to connected users and devices, avoiding the need to implement complex polling mechanisms. Ex. Chat applications, Real-time dashboards such as stock tickers, Real-time alerts and notifications.
Client to server connection
Server to client connection
Routing for websocket API's
- • Incoming JSON messages are routed to different backend • If no routes => sent to $default • You request a route selection expression to select the field on JSON to route from • Sample expression: $request.body.action • The result is evaluated against the route keys available in your API Gateway • The route is then connected to the backend you’ve setup through API Gateway
Ex. Integration with Kinesis data streams

Endpoint types

Edge-optimized API endpoints
- An edge-optimized API endpoint is best for geographically distributed clients. API requests are routed to the nearest CloudFront Point of Presence (POP). This is the default endpoint type for API Gateway REST APIs.
Regional API endpoints
- A regional API endpoint is intended for clients in the same region. When a client running on an EC2 instance calls an API in the same region, or when an API is intended to serve a small number of clients with high demands, a regional API reduces connection overhead.
Private API endpoints
- A private API endpoint is an API endpoint that can only be accessed from your Amazon Virtual Private Cloud (VPC) using an interface VPC endpoint, which is an endpoint network interface (ENI) that you create in your VPC.

Creating REST API

Deploying REST API

After creating your API, you must deploy it to make it callable by your users.
To deploy an API, you create an API deployment and associate it with a stage. A stage is a logical reference to a lifecycle state of your API (for example, dev, prod, beta, v2).
Ex. https://{restapi-id}.execute-api.{region}.amazonaws.com/{stageName}
Stage variables are name-value pairs that you can define as configuration attributes associated with a deployment stage of a REST API. They act like environment variables and can be used in your API setup and mapping templates.
To use a stage variable to customize the HTTP integration endpoint, you must first configure a stage variable of a specified name (for example, url), and then assign it a value, (for example, example.com). Next, from your method configuration, set up an HTTP proxy integration. Instead of entering the endpoint's URL, you can tell API Gateway to use the stage variable value, http://${stageVariables.url}. This value tells API Gateway to substitute your stage variable ${} at runtime, depending on which stage your API is running.
Ex. stageVariables.<variable_name>, { "name" : "$stageVariables.<variable_name>"}, http://${stageVariables.<variable_name>}, arn:aws:apigateway:<region>:<service>:${stageVariables.<variable_name>}

Canary release deployment

software development strategy in which a new version of an API (as well as other software) is deployed for testing purposes, and the base version remains deployed as a production release for normal operations on the same stage.
total API traffic is separated at random into a production release and a canary release with a pre-configured ratio. Typically, the canary release receives a small percentage of API traffic and the production release takes up the rest. The updated API features are only visible to API traffic through the canary. You can adjust the canary traffic percentage to optimize test coverage or performance.
After the test metrics pass your requirements, you can promote the canary release to the production release and disable the canary from the deployment.

Integration type

AWS/HTTP
- you must configure both the integration request and integration response and set up necessary data mappings from the method request to the integration request, and from the integration response to the method response.
- Setup data mapping using mapping templates for the request & response
AWS_PROXY
- This type of integration lets an API method be integrated with the Lambda function invocation action
- you do not set the integration request or the integration response.
- API Gateway passes the incoming request from the client as the input to the backend Lambda function.
- This is the preferred integration type to call a Lambda function through API Gateway
HTTP_PROXY
- The HTTP proxy integration allows a client to access the backend HTTP endpoints with a streamlined integration setup on single API method.
- You do not set the integration request or the integration response.
- Possibility to add HTTP Headers if need be (ex: API key)
MOCK
- return a response without sending the request further to the backend. This is useful for API testing because it can be used to test the integration set up without incurring charges for using the backend and to enable collaborative development of an API.

Integration Request

An integration request is an HTTP request that API Gateway submits to the backend, passing along the client-submitted request data, and transforming the data, if necessary.

Integration Response

You can choose to pass through the result as-is or to transform the integration response data to the method response data if the two have different formats.
you can map the endpoint response data to the method response data. The response data that can be mapped includes the response status code, response header parameters, and response body.
If no method response is defined for the returned status code, API Gateway returns a 500 error.

Mapping templates

Mapping template examples integration
A mapping template is a script expressed in Velocity Template Language (VTL) and applied to the payload using JSONPath .

Request Validation via OpenAPI spec

You can use API Gateway to import a REST API from an external definition file into API Gateway. Currently, API Gateway supports OpenAPI v2.0 and OpenAPI v3.0 definition files
configure API Gateway to perform basic validation of an API request before proceeding with the integration request. When the validation fails, API Gateway immediately fails the request, returns a 400 error response to the caller, and publishes the validation results in CloudWatch Logs.
The required request parameters in the URI, query string, and headers of an incoming request are included and not blank.
The applicable request payload adheres to the configured JSON schema request of the method.
In API Gateway, a model defines the data structure of a payload. In API Gateway, models are defined using the JSON schema draft 4. The following JSON object is sample data in the Pet Store example.
There are two request validators declared in the x-amazon-apigateway-request-validators map at the API level.
The params-only validator is enabled on the API and inherited by the GET method.
This validator allows API Gateway to verify that the required query parameter (q1) is included and not blank in the incoming request.
The all validator is enabled on the POST method.
This validator verifies that the required header parameter (h1) is set and not blank. It also verifies that the payload format adheres to the specified RequestBodyModel If there is no matching content type is found, request validation is not performed.

API Caching

With caching, you can reduce the number of calls made to your endpoint and also improve the latency of requests to your API.
API Gateway caches responses from your endpoint for a specified time-to-live (TTL) period, in seconds. Ex. 300 seconds
This is a HIPAA Eligible Service. For more information about AWS, U.S. Health Insurance Portability and Accountability Act of 1996 (HIPAA), and using AWS services to process, store, and transmit protected health information (PHI)
Clients can invalidate the cache with header: Cache- Control: max-age=0

API KEYS and Usage plans

alphanumeric string values to distribute to your customers
Can use with usage plans to control access
Throttling limits are applied to the API keys
Quotas limits is the overall number of maximum requests
Associate API stages and API keys with the usage plan.
Callers of the API must supply an assigned API key in the x-api-key header in requests to the API
If there is a match, API Gateway throttles the requests based on the plan's request limit and quota
A throttling limit sets the target point at which request throttling should start. This can be set at the API or API method level.
A quota limit sets the target maximum number of requests with a given API key that can be submitted within a specified time interval

API gateway throttling

AWS throttling limits are applied across all accounts and clients in a region. These limit settings exist to prevent your API—and your account—from being overwhelmed by too many requests. These limits are set by AWS and can't be changed by a customer.
Per-account limits are applied to all APIs in an account in a specified Region. The default rate limit is 10.000 requests per second, and the default burst limit is 5000 requests.
Per-API, per-stage throttling limits are applied at the API method level for a stage.
Per-client throttling limits are applied to clients that use API keys associated with your usage plan as client identifier
In API Gateway, the burst limit represents the target maximum number of concurrent request submissions that API Gateway will fulfill before returning 429 Too Many Requests error responses.

CORS

A cross-origin HTTP request is one that is made to:
- A different domain (for example, from example.com to amazondomains.com)
- A different subdomain (for example, from example.com to petstore.example.com)
- A different port (for example, from example.com to example.com:10777)
- A different protocol (for example, from https://example.com to http://example.com)
The OPTIONS pre-flight request must contain the following headers:
- Access-Control-Allow-Methods
- Access-Control-Allow-Headers
- Access-Control-Allow-Origin
For the proxy: lambda should return response headers itself in the response
For the non-proxy: it can be configured from the console.

Security (IAM PERMISSIONS)

With IAM identity-based policies, you can specify which actions and resources are allowed or denied as well as the conditions under which actions are allowed or denied.
The following example shows an identity-based policy that allows a user to create or update only private REST APIs.
Authentication = IAM | Authorization = IAM Policy
Resource based policies
- Ex. resource policy grants API access in one AWS account to two roles in a different AWS account via Signature Version 4 (SigV4) protocols
- Ex. resource policy denies (blocks) incoming traffic to an API from two specified source IP address blocks.
- Ex. resource policies allow incoming traffic to a private API only from a specified virtual private cloud (VPC) or VPC endpoint.
Service-linked roles: AWSServiceRoleForAPIGateway – Allows API Gateway to access Elastic Load Balancing, Amazon Kinesis Data Firehose, and other service resources on your behalf.

Security (Cognito User pools)

Cognito fully manages user lifecycle, token expires automatically
API gateway verifies identity automatically from AWS Cognito
No custom implementation required
Authentication = Cognito User Pools | Authorization = API Gateway Methods

Security (LAMBDA AUTHORIZER)

Authentication = External | Authorization = Lambda function
Token-based authorizer (bearer token) – ex JWT (JSON Web Token) or Oauth
A request parameter-based Lambda authorizer (headers, query string, stage var)
Lambda must return an IAM policy for the user, result policy is cached
V1 example:
V2 example:
JWT Authorizer:
- Decode the token, Check the token's algorithm and signature by using the public key that is fetched from the issuer's jwks_uri, Validate claims,
  - kid – The token must have a header claim that matches the key in the jwks_uri that signed the token.
  - iss – Must match the issuer that is configured for the authorizer.
  - aud or client_id – Must match one of the audience entries that is configured for the authorizer.
  - exp – Must be after the current time in UTC.
  - nbf – Must be before the current time in UTC.
  - iat – Must be before the current time in UTC.
  - scope or scp – The token must include at least one of the scopes in the route's authorizationScopes.

Monitoring and Logging

Amazon API Gateway is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, a role, or an AWS service in API Gateway.
AWS Config to record configuration changes made to your API Gateway API resources and send notifications based on resource changes. Maintaining a configuration change history for API Gateway resources is useful for operational troubleshooting, audit, and compliance use cases.
Use CloudWatch Logs or Amazon Kinesis Data Firehose to log requests to your APIs.
Using CloudWatch alarms, you watch a single metric over a time period that you specify.
X-RAY enabled tracing
CacheHitCount & CacheMissCount: efficiency of the cache
Count: The total number API requests in a given period.
IntegrationLatency: The time between when API Gateway relays a request to the backend and when it receives a response from the backend.
Latency: The time between when API Gateway receives a request from a client and when it returns a response to the client. The latency includes the integration latency and other API Gateway overhead.
4XXError (client-side) & 5XXError (server-side)

Troubleshooting and Debugging

Lambda integration errors
- Permission errors (can be viewed $context.integrationErrorMessage logging variable to your log format)
JWT Authorizers
- For ex. Unauthorized
- Check the www-authenticate header in the response from the API.

SERVERLESS APPLICATION MODEL (SAM)

The AWS Serverless Application Model (AWS SAM) is a toolkit that improves the developer experience of building and running serverless applications on AWS. AWS SAM consists of two primary parts:
- AWS SAM template specification: can use to define your serverless application infrastructure on AWS.
- Use the AWS CloudFormation syntax directly within your AWS SAM template, taking advantage of its extensive support of resource and property configurations
- AWS SAM does the complex work of transforming your template into the code necessary to provision your infrastructure through AWS CloudFormation
- AWS SAM command line interface (AWS SAM CLI): A command line tool that you can use with AWS SAM templates and supported third-party integrations to build and run your serverless applications. Perform local debugging and testing.
Example
SAM process in breif
sam init
sam build
sam local invoke: Invoke Lambda function with payload once and quit after invocation completes
sam local start-lambda: Starts a local endpoint that emulates AWS Lambda
sam local start-api: Starts a local HTTP server that hosts all your functions
sam local generate-event: Generate sample payloads for event sources
sam deploy
sam build
- creates a .aws-sam build directory containing The AWS SAM template, The build.toml file, Contains your Lambda functions and layers structured independently of each other.
- The --use-container option downloads a container image and uses it to build your Lambda functions. The local container is then referenced in your .aws-sam/build.toml file.
- Use the --container-env-var to pass environment variables to the build container.
sam deploy
AWS::Serverless::Function
- Creates an AWS Lambda function, an AWS Identity and Access Management (IAM) execution role, and event source mappings that trigger the function.
- DeploymentPreference: enable gradual Lambda deployments.
- Hooks: Validation Lambda functions that are run before and after traffic shifting.
AWS::Serverless::Api
- Creates a collection of Amazon API Gateway resources and methods that can be invoked through HTTPS endpoints.
AWS::Serverless::SimpleTable
- Creates a DynamoDB table with a single attribute primary key. It is useful when data only needs to be accessed via a primary key.

AWS SAM policy templates

The AWS Serverless Application Model (AWS SAM) allows you to choose from a list of policy templates to scope the permissions of your Lambda functions and AWS Step Functions state machines to the resources that are used by your application.
S3ReadPolicy: Gives read only permissions to objects in S3
SQSPollerPolicy: Allows to poll an SQS queue
DynamoDBCrudPolicy: CRUD = create read update delete

SAR (Serverless Application Repository)

managed repository for serverless applications. It enables teams, organizations, and individual developers to store and share reusable applications, and easily assemble and deploy serverless architectures
you can use pre-built applications from the Serverless Application Repository in your serverless architectures, helping you and your teams reduce duplicated work, ensure organizational best practices, and get to market faster
When you publish a serverless application to the AWS Serverless Application Repository, you make it available for others to find and deploy.
Before you can deploy an application, the AWS Serverless Application Repository checks the application’s template for IAM roles, AWS resource policies, and nested applications that the template specifies that it should create. Applications can contain any of the following four capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_RESOURCE_POLICY, and CAPABILITY_AUTO_EXPAND.
sam publish

Messaging systems and patterns + Integration

The following lists out a few common ways AWS customers are using a combination of services.
- Routing Amazon EventBridge or Amazon Simple Notification Service (Amazon SNS) events to an Amazon Simple Queue Service (Amazon SQS) queue as a buffer for downstream consumers.
- Pulling events directly from a stream (Kinesis Data Streams or Amazon Managed Streaming for Apache Kafka (Amazon MSK)) or a queue (SQS or Amazon MQ) with EventBridge Pipes and sending events to an
  EventBridge bus to push out to consumers.
- Routing EventBridge or SNS events to a Kinesis Data Streams or Amazon MSK for gathering and viewing analytics.
Consumer

SQS (SIMPLE QUEUE SERVICE)

offers hosted queues that integrate and decouple distributed software systems and components
Messages in the queue are typically processed by a single subscriber.
Can have duplicate messages (at least once delivery, occasionally)
Can have out of order messages (best effort ordering)
Producer
- Produced to SQS using the SDK (SendMessage API)
- The message is persisted in SQS until a consumer deletes it
- Message retention: default 4 days, up to 14 days
Consumer
- Poll SQS for messages (receive up to 10 messages at a time)
- Delete the messages using the DeleteMessage API
Increasing throughput of processing of consumers
SQS with auto scaling group
SQS decoupling
Standard vs FIFO queues
Short polling:
- Amazon SQS sends the response right away, even if the query found no messages.
Long polling:
- Amazon SQS sends an empty response only if the polling wait time expires. sends a response after it collects at least one available message, up to the maximum number of messages specified in the request

SQS MESSAGE VISIBILITY

Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents all consumers from receiving and processing the message. The default visibility timeout for a message is 30 seconds. The minimum is 0 seconds. The maximum is 12 hours.
If you don't know how long it takes to process a message, create a heartbeat for your consumer process: Specify the initial visibility timeout (for example, 2 minutes) and then—as long as your consumer still works on the message—keep extending the visibility timeout by 2 minutes every minute.
You can shorten or extend a message's visibility by specifying a new timeout value using the ChangeMessageVisibility action.

SQS WITH (DLQ)

Amazon SQS supports dead-letter queues (DLQ), which other queues (source queues) can target for messages that can't be processed (consumed) successfully.
The maxReceiveCount is the number of times a consumer tries receiving a message from a queue without deleting it before being moved to the dead-letter queue.
The redrive allow policy specifies which source queues can access the dead-letter queue (Good to set a retention of 14 days in the DLQ)
You can use dead-letter queue redrive to manage the lifecycle of unconsumed messages. After you have investigated the attributes and related metadata available for standard unconsumed messages in a dead-letter queue, you can redrive the messages back to their source queues.
DLQ of a FIFO queue must also be a FIFO queue
DLQ of a Standard queue must also be a Standard queue

SQS DELAY QUEUE

Delay queues let you postpone the delivery of new messages to consumers for a number of seconds, for example, when your consumer application needs additional time to process messages.
(0-15MINS)
SQS JAVA EXTENDED CLIENT

SQS FIFO DEDUPLICATION

Message deduplication ID is the token used for deduplication of sent messages. If a message with a particular message deduplication ID is sent successfully, any messages sent with the same message deduplication ID are accepted successfully but aren't delivered during the 5-minute deduplication interval.
Content-based deduplication: will do a SHA-256 hash of the message body

SQS GROUPING MESSAGES

MessageGroupId is the tag that specifies that a message belongs to a specific message group. Messages that belong to the same message group are always processed one by one, in a strict order relative to the message group (however, messages that belong to different message groups might be processed out of order).

SQS SECURITY

IAM POLICY SYSTEM (identity based policies)
- Attach a permission policy to a user or a group in your account, Attach a permission policy to a user in another AWS account, Attach a permission policy to a role (grant cross-account permissions)
SQS ACCESS POLICIES
- Grant one permission to one AWS account
- Grant all permissions to two AWS accounts
- Grant a permission to all users
- S3 bucket notifications to SQS

IMPORTANT APIS

• CreateQueue (MessageRetentionPeriod), DeleteQueue
• PurgeQueue: delete all the messages in queue
• SendMessage (DelaySeconds), ReceiveMessage, DeleteMessage
• MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage API)
• ReceiveMessageWaitTimeSeconds: Long Polling
• ChangeMessageVisibility: change the message timeout
• Batch APIs for SendMessage, DeleteMessage, ChangeMessageVisibility helps decrease your costs

CREATING SQS

Troubleshooting, Monitoring and Debugging

Amazon SQS is integrated with AWS CloudTrail, a service that provides a record of the Amazon SQS calls that a user, role, or AWS service makes.
CloudTrail captures API calls related to Amazon SQS queues as events, including calls from the Amazon SQS console and code calls from Amazon SQS APIs.
CloudWatch considers a queue to be active for up to six hours if it contains any messages or if any action accesses it.
ex. ApproximateNumberOfMessagesDelayed, ApproximateNumberOfMessagesNotVisible, ApproximateNumberOfMessagesVisible, ApproximateAgeOfOldestMessage etc...
EventBridge lets you set a variety of targets—such as Amazon SQS standard and FIFO queues—which receive events in JSON format
X-Ray tracing header X-Amzn-Trace-Id
Issues
- permission issues (can be fixed by updating IAM policies)

SNS (SIMPLE NOTIFICATION SERVICE)

publish-subscribe service that provides message delivery from publishers (also known as producers) to multiple subscriber endpoints(also known as consumers)
Publishers communicate asynchronously with subscribers by sending messages to a topic
Subscribers can subscribe to an Amazon SNS topic and receive published messages using a supported endpoint type

SQS-SNS FANOUT PATTERN

Push once in SNS, receive in all SQS queues that are subscribers
S3 Events to multiple queues
SNS to Amazon S3 through Kinesis Data Firehose
FIFO FANOUT

SNS FIFO

provide strict message ordering and message deduplication
An Amazon SNS FIFO topic always delivers messages to subscribed Amazon SQS queues in the exact order in which the messages are published to the topic, and only once.
With an Amazon SQS FIFO queue subscribed, the consumer of the queue receives the messages in the exact order in which the messages are delivered to the queue, and no duplicates
You can have multiple applications (or multiple threads within the same application) publishing messages to an SNS FIFO topic in parallel. To determine the established sequence of messages, you can check the sequence number (ASSIGNED BY AWS SNS)
Messages that belong to the same group are processed one by one, in a strict order relative to the group.

MESSAGE FILTERING

A filter policy is a JSON object containing properties that define which messages the subscriber receives
Amazon SNS FIFO topics support message filtering.
Ex. for messageAttributes policy
For messageBody policy

DLQ

For any type of error, Amazon SNS can sideline messages to Amazon SQS dead-letter queues so data isn't lost.

SNS SECURITY

IAM policies (Identity based policies)
SNS policies
ARN includes topic to restrict actions
RAW MESSAGE DELIVERY:
CROSS ACCOUNT DELIVERY
You can use these temporary security credentials in making requests to Amazon SNS.
The API libraries compute the necessary signature value using those credentials to authenticate your request. If you send requests using expired credentials Amazon SNS denies the request.

Troubleshooting, Monitoring and Debugging

CloudTrail captures API calls for Amazon SNS as events. The calls captured include calls from the Amazon SNS console and code calls to the Amazon SNS API operations.
CloudWatch and alarms/notifications
ex. NumberOfNotificationsFailed, NumberOfNotificationsDelivered etc...
X-Ray tracing can be enabled
Issues
- permission issues (can be fixed by updating IAM policies)

KINESIS

Kinesis Data Streams to collect and process large streams of data records in real time.
Data Streams application reads data from a data stream as data records.
A Kinesis data stream is a set of shards. Each shard has a sequence of data records. Each data record has a sequence number that is assigned by Kinesis Data Streams.
Data records are composed of a sequence number, a partition key, and a data blob, which is an immutable sequence of bytes.
The retention period is the length of time that data records are accessible after they are added to the stream (1-365 days), Ability to reprocess (replay) data
Provisioned mode
- you must specify the number of shards for the data stream. The total capacity of a data stream is the sum of the capacities of its shards. You can increase or decrease the number of shards in a data stream as needed and you are charged for the number of shards at an hourly rate.
On-demand mode:
- Kinesis Data Streams automatically manages the shards in order to provide the necessary throughput. You are charged only for the actual throughput that you use and Kinesis Data Streams automatically accommodates your workloads’ throughput needs as they ramp up or down
Producers put records into Amazon Kinesis Data Streams
Consumers get records from Amazon Kinesis Data Streams and process them.
A shard is a uniquely identified sequence of data records in a stream
A partition key is used to group data by shard within a stream
Each data record has a sequence number that is unique per partition-key within its shard.
The Kinesis Client Library (KCL) is compiled into your application to enable fault-tolerant consumption of data from the stream.

Producers

Puts data records into data streams • Data record consists of:
• Sequence number (unique per partition-key within shard)
• Partition key (must specify while put records into stream)
• Data blob (up to 1 MB)
• Producers:
• AWS SDK: simple producer
• Kinesis Producer Library (KPL): C++, Java, batch, compression, retries
• Kinesis Agent: monitor log files • Write throughput: 1 MB/sec or 1000 records/sec per shard
• PutRecord API
• Use batching with PutRecords API to reduce costs & increase throughput

Consumers

Custom Consumer (AWS SDK) – Classic or Enhanced Fan-Out
- Classic
  - pull records by consumers
  - shared throughput
- Enhanced
  - subscribed, push records to consumers
  - same throughput to all consumers • Kinesis Client Library (KCL): library to simplify reading from data stream

KCL

A Java library that helps read record from a Kinesis Data Stream with distributed applications sharing the read workload
Each shard is to be read by only one KCL instance • 4 shards = max. 4 KCL instances
• 6 shards = max. 6 KCL instances
Records are read in order at the shard level
Versions:
• KCL 1.x (supports shared consumer)
• KCL 2.x (supports shared & enhanced fan-out consumer
For each Amazon Kinesis Data Streams application, KCL uses a unique lease table (stored in a Amazon DynamoDB table) to keep track of the shards in a KDS data stream that are being leased and processed by the workers of the KCL consumer application.
Task split by KCL workers via DynamoDB
Max it can be as number of shards, otherwise extra KCL workers wouldn't do anything (idle)

kinesis shard split and merge

Used to divide a “hot shard”
The old shard is closed and will be deleted once the data is expired
Can’t split into more than two shards in a single operation
Can be used to group two shards with low traffic (cold shards)
Old shards are closed and will be deleted once the data is expired
Can’t merge more than two shards in a single operation

Kinesis Data Firehose

delivering real-time streaming data to multiple destinations

Kinesis Data analytics (SQL applications)

Real-time analytics on Kinesis Data Streams & Firehose using SQL

Kinesis Data analytics (For Apache Flink application)

Use Flink (Java, Scala or SQL) to process and analyze streaming data

SQS VS SNS VS KINESIS

Security

IAM policies to control access
Ex.

Security

IAM

AWS Identity and Access Management (IAM) is a web service that helps you securely control access to AWS resources.
You use IAM to control who is authenticated (signed in) and authorized (has permissions) to use resources.
IAM doesn't requires region selection (global)
IAM, like many other AWS services, is eventually consistent. IAM achieves high availability by replicating data across multiple servers within Amazon's data centers around the world. If a request to change some data is successful, the change is committed and safely stored. However, the change must be replicated across IAM, which can take some time.

When principal makes a request to AWS for authn and authz, AWS gathers the request information into a request context, which is used to evaluate and authorize the request.
AWS checks each policy that applies to the context of your request. If a single permissions policy includes a denied action, AWS denies the entire request and stops evaluating. This is called an explicit deny. Because requests are denied by default, AWS authorizes your request only if every part of your request is allowed by the applicable permissions policies.
IAM users are granted long-term credentials to your AWS resources. In contrast, users in AWS IAM Identity Center (successor to AWS Single Sign-On) are granted short-term credentials to your AWS resources.
User Federation in usecases like:
- Your users already exist in a corporate directory: Corporate directory, Microsoft Azure AD, external IDP's like Okta etc..
- Your users already have Internet identities: Internet identity provider like Login with Amazon, Facebook, Google, or any OpenID Connect (OIDC) compatible identity provider

ACCESS CONTROL METHODS/WAYS:

Single sign-on access for human users
Federated access for human users,
Cross-account access between AWS accounts
Long-term credentials for designated IAM users in your AWS account

#######################TASKS REQUIRING ROOT USER CREDENTIALS#######################

Change your account settings. This includes the account name, email address, root user password, and root user access keys.
Restore IAM user permissions. If the only IAM administrator accidentally revokes their own permissions, you can sign in as the root user to edit policies and restore those permissions.
Activate IAM access to the Billing and Cost Management console.
View certain tax invoices. An IAM user with the aws-portal:ViewBilling permission can view and download VAT invoices from AWS Europe, but not AWS Inc. or Amazon Internet Services Private Limited (AISPL).
Close your AWS account.
Register as a seller in the Reserved Instance Marketplace.
Configure an Amazon S3 bucket to enable MFA (multi-factor authentication).
Edit or delete an Amazon Simple Queue Service (Amazon SQS) resource policy that denies all principals.
Edit or delete an Amazon Simple Storage Service (Amazon S3) bucket policy that denies all principals.
Sign up for AWS GovCloud (US).
Request AWS GovCloud (US) account root user access keys from AWS Support.

#######################IAM USER#######################

An IAM user is an identity within your AWS account that has specific permissions for a single person or application
An IAM group is an identity that specifies a collection of IAM users. You can't sign in as a group. You can use groups to specify permissions for multiple users at a time.
Every User can be either alone or put into a group i.e. policies can be individually assigned to IAM user (single) or to a group.

While creating IAM User Two options ->

Specify a user in Identity Center - Recommended We recommend that you use Identity Center to provide console access to a person. With Identity Center, you can centrally manage user access to their AWS accounts and cloud applications.
I want to create an IAM user We recommend that you create IAM users only if you need to enable programmatic access through access keys, service-specific credentials for AWS CodeCommit or Amazon Keyspaces, or a backup credential for emergency account access.
Control maximum permissions using IAM boundary:
- An entity's permissions boundary allows it to perform only the actions that are allowed by both its identity-based policies and its permissions boundaries.
- The permissions boundary for an IAM entity (user or role) sets the maximum permissions that the entity can have.
- If any one of these policy types explicitly denies access for an operation, then the request is denied.
Administrative access to IAM user group (group name is admin here)
Now, this user can also have its own alias set after which its own custom URL will be generated for Logon.

#######################IAM POLICIES#######################

Identity-based and resource-based policies

Identity-based policies are permissions policies that you attach to an IAM identity, such as an IAM user, group, or role.
- AWS Managed policies, Customer managed policies, Inline policies
Resource-based policies are permissions policies that you attach to a resource such as an Amazon S3 bucket or an IAM role trust policy.
- Resource-based policies are inline policies, and there are no managed resource-based policies.
- The IAM service supports only one type of resource-based policy called a role trust policy, which is attached to an IAM role.
- Trust policies define which principal entities (accounts, users, roles, and federated users) can assume the role.
For ex. in below, identity based policies given to users defines what actions can they perform on specific resources whereas resource based policies defines which principals (users) have access to them.
For requests made from one account to another, the requester in Account A must have an identity-based policy that allows them to make a request to the resource in Account B. Also, the resource-based policy in Account B must allow the requester in Account A to access the resource. There must be policies in both accounts that allow the operation, otherwise the request fails.
In addition, Amazon S3 supports a permission mechanism known as an access control list (ACL) that is independent of IAM policies and permissions. You can use IAM policies in combination with Amazon S3 ACLs.
ACL(ACCESS CONTROL LIST):
- enable you to manage access to buckets and objects.
- When you create a bucket or an object, Amazon S3 creates a default ACL that grants the resource owner full control over the resource.

AWS MANAGED POLICY VS CUSTOMER MANAGED POLICY

Managed policies that are created and managed by AWS (AWS MANAGED POLICIES)
AWS managed policies make it convenient for you to assign appropriate permissions to users, groups, and roles. It is faster than writing the policies yourself
Managed policies that you create and manage in your AWS account.
You cannot change the permissions defined in AWS managed policies. AWS occasionally updates the permissions defined in an AWS managed policy.
Customer managed policies provide more precise control over your policies than AWS managed policies. You can create, edit, and validate an IAM policy in the visual editor or by creating the JSON policy document directly.
IAM Policy inheritance
IAM policy structure and meaning of JSON documents
IAM policy can be created using policy visual editor
Overall for ex. Below shows User which has policies attached via different mechanisms
Permission evaluation
When evaluating if an IAM Principal can perform an operation X on a bucket, the union of its assigned IAM Policies and S3 Bucket Policies will be evaluated
Few examples of how permission are evaluated

#######################PASSWORD POLICY IAM#######################

IAM password policy can be managed and user can be required to change their password at certain time or their first logon etc...
Password policy can be either chosen as AWS default or it can be new custom policy as defined by admin user.

#######################MFA#######################

Multi factor authentication
Types of MFA devices
- Virtual MFA device (google authenticator, Authy)
- Universal 2nd Factor U2F key ex. YubiKey by Yubico (3rd party)
- Hardware Key Fob MFA device (Gemalto 3rd party)
- Hardware Key Fob MFA device for AWS GovCloud (surePassID 3rd party)
You can register up to 8 MFA devices of any combination with your AWS account root and IAM user.

#######################LOGON WITH ACCESS KEYS#######################

ACCESS KEY ID (USERNAME)
SECRET ACCESS KEYS (PASSWORD)
According to purpose of access keys, AWS recommends multiple options like AWS CLI V2, CLOUDSHELL ETC...

#######################IAM ROLES#######################

A role is an IAM identity that you can create in your account that has specific permissions.
However, instead of being uniquely associated with one person, a role can be assumed by anyone who needs it. A role does not have standard long-term credentials such as a password or access keys associated with it. Instead, when you assume a role, it provides you with temporary security credentials for your role session.
Roles are meant to be depending on various use cases, for ex. Assigned directly to the aws services instead of users.
Few examples shown below:
When you create a role, you create two policies:
- A role trust policy that specifies who can assume the role and a permissions policy that specifies what can be done with the role.
- You specify the trusted principal who is allowed to assume the role in the role trust policy.
For ex. Role creation for EC2 instance
Role chaining:
- Role chaining is when you use a role to assume a second role through the AWS CLI
- AWS does not treat using roles to grant permissions to applications that run on EC2 instances as role chaining.

SCENARIOS FOR ROLES

Providing access to an IAM user in another AWS account that you own
Providing access for non AWS workloads
Providing access to AWS accounts owned by third parties
Providing access to an AWS service
Providing access to externally authenticated users (identity federation)

Using IAM Roles

AWS Management Console (by switching roles)
assume-role CLI or AssumeRole API operation
assume-role-with-saml CLI or AssumeRoleWithSAML API operation
assume-role-with-web-identity CLI or AssumeRoleWithWebIdentity API operation
Console URL constructed with AssumeRole, AssumeRoleWithSAML, AssumeRoleWithWebIdentity (broker constructs the URL and calls STS for assuming the roles)

The confused deputy problem

The confused deputy problem is a security issue where an entity that doesn't have permission to perform an action can coerce a more-privileged entity to perform the action.
Prevention: aws:SourceArn and aws:SourceAccount global condition context keys in resource-based policies to limit the permissions that a service has to a specific resource.

#######################IAM POLICY SIMULATOR#######################

One can choose and evaluate the roles and policies attached in real time using this simulator.

#######################ABAC#######################

Attribute-based access control (ABAC) is an authorization strategy that defines permissions based on attributes. In AWS, these attributes are called tags. You can attach tags to IAM resources, including IAM entities (users or roles) and to AWS resources.
These ABAC policies can be designed to allow operations when the principal's tag matches the resource tag.
ABAC is helpful in environments that are growing rapidly and helps with situations where policy management becomes cumbersome.
The disadvantage to using the traditional RBAC model is that when employees add new resources, you must update policies to allow access to those resources.
RBAC
ABAC

#######################SECURITY TOKEN SERVICE (STS)#######################

You can use the AWS Security Token Service (AWS STS) to create and provide trusted users with temporary security credentials that can control access to your AWS resources.
Temporary security credentials are not stored with the user but are generated dynamically and provided to the user when requested.
By default, AWS STS is a global service with a single endpoint at https://sts.amazonaws.com. However, you can also choose to make AWS STS API calls to endpoints in any other supported Region.
This is used in Identity federation usecases, cross account accesses
The AWS STS API operations create a new session with temporary security credentials that include an access key pair and a session token. The access key pair consists of an access key ID and a secret key. Users (or an application that the user runs) can use these credentials to access your resources.
TO Have user call STS, User should be assigned in-line policy with STS assumeRole (depending on usecase there are 3 options)
Bearer tokens:
- AWS STS can help in getting bearer tokens which are required for accessing some services programmatically. For ex. AWS CodeArtifact etc..
- The token's access key ID begins with the ABIA prefix (can help in cloudtrail logs)
- The bearer token can be used only for calls to the service that generates it and in the Region where it was generated.
All the STS calls can be traced by cloudtrail logs for ex. as shown below :
Calls made from another account to access cross-account services which assumes a role : sts:SourceIdentity condition key in the role trust policy to require users to specify an identity when they assume a role. For ex. shown below :

Now, For this same request one entry will be there in the another account (assumed role account log).
When performing role chaining, tags are passed on from first assumed role as "transitive tags" as shown below:
Web Identity provider - includes the tags passed through identity provider as shown below :
Sign-in events are logged as well except the incorrect username whose information is then masked with HIDDEN_DUE_TO_SECURITY_REASONS
EXAMPLES OF STS API's :
AssumeRole, AssumeRoleWithSAML, AssumeRoleWithWebIdentity, GetFederationToken, GetSessionToken
The permissions policy of the role that is being assumed determines the permissions for the temporary security credentials that are returned by AssumeRole, AssumeRoleWithSAML, and AssumeRoleWithWebIdentity
Optionally, one can pass inline or managed session policies as parameters which filters only the passed permissions returned.

AWS STS WITH MFA

aws:MultiFactorAuthPresent:true
GetSessionToken returns: • Access ID • Secret Key • Session Token • Expiration date

#######################IAM SECURITY TOOLS#######################

IAM CREDENTIALS REPORT (ACCOUNT LEVEL)
- report that lists all your account users and the status of their various credentials

IAM ACCESS ADVISOR (USER LEVEL)
- shows service permissions granted to a user and when were they last accessed
- you can use this information to revise the policies, can help in reducing the permissions for least privelege.

#######################IAM BEST PRACTICES#######################

DONT'T user ever root user except for AWS account setup
One physical user = AWS IAM user
Assign users to groups and assign permissions to groups
create a strong password policy
use and enforce MFA
create and use Roles for giving permissions to AWS services
Use Access keys for Programmatic access (CLI/SDK)
Audit permission of your account using IAM creds report and access advisor
Never share IAM user and access keys

#######################AWS SHARED RESPONSIBILITY MODEL FOR IAM#######################

AWS
- INFRASTRUCTURE OF GLOBAL NETWORK SECURITY
- CONFIGURATION AND VULN ANALYSIS
- COMPLIANCE VALIDATION
YOU
- USERS, GROUPS, ROLES AND POLICIES MANAGEMENT AND MONITORING
- ENABLE MFA ON ALL ACCOUNTS
- ROTATE ALL YOUR KEYS OFTEN
- USE IAM ROOLS TO APPLY APPROPRIATE PERMISSIONS
- ANALYZE ACCESS PATTERNS AND REVIEW PERMISSIONS

#######################Identity providers and federation#######################

In case users are already managed outside of AWS like in corporate directory or any other identity providers, you can use IAM identity providers instead of creating IAM users in your AWS account. Ex. well-known IdP, such as Login with Amazon, Facebook, or Google.
IAM supports IdPs that are compatible with OpenID Connect (OIDC) or SAML 2.0 (Security Assertion Markup Language 2.0)

OIDC

OpenID Connect is an interoperable authentication protocol based on the OAuth 2.0 framework of specifications (IETF RFC 6749 and 6750). It simplifies the way to verify the identity of users based on the authentication performed by an Authorization Server and to obtain user profile information in an interoperable and REST-like manner.

SAML

Security Assertion Markup Language is an open standard for exchanging authentication and authorization data between parties, in particular, between an identity provider and a service provider using SOAP over HTTP.
A SAML Request, also known as an authentication request, is generated by the Service Provider to "request" an authentication.
A SAML Response is generated by the Identity Provider. It contains the actual assertion of the authenticated user. In addition, a SAML Response may contain additional information, such as user profile information and group/role information, depending on what the Service Provider can support.
The Service Provider never directly interacts with the Identity Provider. A browser acts as the agent to carry out all the redirections.
The preferred way to use web identity federation is to use Amazon Cognito.
Using SAML-based federation for API access to AWS

IAM OIDC identity providers are entities in IAM that describe an external identity provider (IdP) service that supports the OpenID Connect (OIDC) standard, such as Google or Salesforce. You use an IAM OIDC identity provider when you want to establish trust between an OIDC-compatible IdP and your AWS account.
If you are using an OIDC identity provider from either Google, Facebook, or Amazon Cognito, do not create a separate IAM identity provider using this procedure. These OIDC identity providers are already built-in to AWS and are available for your use.
Assigning roles to federated identity :

Note:

Github OIDC provider: Best practice is to limit the to a specific GitHub organization, repository, or branch using the conditions section defined in the permission policies.
For trusting the external IDP, you must supply a thumbprint. IAM requires the thumbprint for the top intermediate certificate authority (CA) that signed the certificate used by the external identity provider (IdP).
SAML OIDC identity provider configuration requires metadata document to be uploaded. This document includes the issuer's name, expiration information, and keys that can be used to validate the SAML authentication response (assertions) that are received from the IdP.
After you have verified a user's identity in your organization, the external identity provider (IdP) sends an authentication response to the AWS SAML endpoint at https://region-code.signin.aws.amazon.com/saml
Custom IDP broker: You can write and run code to create a URL that lets users who sign in to your organization's network securely access the AWS Management Console. The URL includes a sign-in token that you get from AWS and that authenticates the user to AWS.

#######################Tagging IAM resources#######################

A tag is a custom attribute label that you can assign to an AWS resource. Each tag has two parts: A tag key, An optional field known as a tag value
Names for AWS tags are case sensitive so ensure that they are used consistently.

When to use IAM policies vs S3 policies

If question is What can this user do in AWS? -> choose IAM policies
If question is Who can access this S3 bucket?? -> choose S3 bucket policies

#######################DIRECTORY SERVICES#######################
LDAP

Lightweight Directory Access Protocol (LDAP) is a vendor-neutral software protocol used to lookup information or devices within a network
users will go through one of two possible user authentication methods: simple authentication, like SSO with login credentials, or SASL authentication, which binds the LDAP server to a program like Kerberos.

ACTIVE DIRECTORY, AWS DIRECTORY SERVICES

LDAP is the core protocol used in–but not exclusive to–Microsoft’s Active Directory (AD) directory service, a large directory service database that contains information spanning every user account in a network
ADDS: Active Directory stores information about objects on the network and makes this information easy for administrators and users to find and use.
ADFS: Active Directory Federation Service (AD FS) enables Federated Identity and Access Management by securely sharing digital identity and entitlements rights across security and enterprise boundaries. AD FS extends the ability to use single sign-on functionality that is available within a single security or enterprise boundary to Internet-facing applications to enable customers, partners, and suppliers a streamlined user experience while accessing the web-based applications of an organization.
Four options in AWS:
- AWS MANAGED AD: AWS Directory Service for Microsoft Active Directory is powered by an actual Microsoft Windows Server Active Directory (AD), managed by AWS in the AWS Cloud.
- AD Connector: AD Connector is a proxy service that provides an easy way to connect compatible AWS applications. When you add users to AWS applications such as Amazon QuickSight, AD Connector reads your existing Active Directory to create lists of users and groups to select from
- Simple AD is a Microsoft Active Directory–compatible directory from AWS Directory Service that is powered by Samba 4.
- Amazon Cognito is a user directory that adds sign-up and sign-in to your mobile app or web application using Amazon Cognito User Pools.

#######################IAM IDENTITY CENTER#######################

IAM Identity Center provides one place where you can create or connect workforce users and centrally manage their access across all their AWS accounts and applications.
With application assignments, you can grant your workforce users in IAM Identity Center single sign-on access to SAML 2.0 applications, such as Salesforce and Microsoft 365
With multi-account permissions you can plan for and centrally implement IAM permissions across multiple AWS accounts at one time without needing to configure each of your accounts manually.

STEPS TO START USING IDENTITY CENTER:
- Enable identity center (also should be using AWS organizations otherwise it will create one in the process)
- Choose your identity source:
  - Identity Center directory default
  - Active Directory
  - External identity provider
- After you enable IAM Identity Center, you must choose your identity source. The identity source that you choose determines where IAM Identity Center searches for users and groups that need single sign-on access.
- Create an administrative permission set: Permission sets are stored in IAM Identity Center and define the level of access that users and groups have to an AWS account.
- To set up AWS account access for an administrative user in IAM Identity Center, you must assign the user to the AdministratorAccess permission set.
- Similarly, for different other uses, least privelege sets can be created and accordingly assigned when new users/groups are synced from other directories.

When working in IAM Identity Center, users must be uniquely identifiable. IAM Identity Center implements a user name that is the primary identifier for your users. For most of SAML based integration, it's the user's email address. IAM Identity Center allows you to specify something other than an email address for user sign-in.
Identity Center enabled applications can work with users and groups for which IAM Identity Center is aware. Provisioning is the process of making user and group information available for use by IAM Identity Center and Identity Center enabled applications.
There are two types of authentication sessions maintained by IAM Identity Center: one to represent the users’ sign in to IAM Identity Center, and another to represent the users’ access to IAM Identity Center enabled applications, such as Amazon SageMaker Studio or Amazon Managed Grafana. Sessions are cached for 1 hr refreshable, so if user is disabled/deleted, then upto 1 hr, it can still perform or login or create another sessions.
A permission set is a template that you create and maintain that defines a collection of one or more IAM policies. One can use predefined permission sets or custom permission sets.
Although IAM Identity Center determines access from the Region in which you enable the service, AWS accounts are global. This means that after users sign in to IAM Identity Center, they can operate in any Region when they access AWS accounts through IAM Identity Center
IAM service-linked roles: A service-linked role is a unique type of IAM role that is linked directly to IAM Identity Center. It is predefined by IAM Identity Center and includes all the permissions that the service requires to call other AWS services on your behalf

#######################AMAZON COGNITO#######################

Amazon Cognito is an identity platform for web and mobile apps. It’s a user directory, an authentication server, and an authorization service for OAuth 2.0 access tokens and AWS credentials.
With Amazon Cognito, you can authenticate and authorize users from the built-in user directory, from your enterprise directory, and from consumer identity providers like Google and Facebook.

User pools

User pools are a user directory with both self-service and administrator-driven user creation, management, and authentication.
Your organization's SAML 2.0 and OIDC IdPs bring workforce identities into Cognito and your app. The public OAuth 2.0 identity stores Amazon, Google, Apple and Facebook bring customer identities.
From a user pool, you can issue authenticated JSON web tokens (JWTs) directly to an app, a web server, or an API.

OIDC IdP | Issue ID tokens to authenticate users Authorization server | Issue access tokens to authorize user access to APIs SAML 2.0 SP | Transform SAML assertions into ID and access tokens OIDC SP | Transform OIDC tokens into ID and access tokens OAuth 2.0 SP | Transform ID tokens from Apple, Facebook, Amazon, or Google to your own ID and access tokens Authentication frontend service | Sign up, manage, and authenticate users with the hosted UI API support for your own UI | Create, manage and authenticate users through API requests in supported AWS SDKs¹ MFA | Use SMS messages, TOTPs, or your user's device as an additional authentication factor¹ Security monitoring & response | Secure against malicious activity and insecure passwords¹ Customize authentication flows | Build your own authentication mechanism, or add custom steps to existing flows¹ Groups | Create logical groupings of users, and a hierarchy of IAM role claims when you pass tokens to identity pools Customize ID tokens | Customize your ID tokens with new, modified, and suppressed claims Customize user | attributes Assign values to user attributes and add your own custom attributes

Identity pools

Set up an Amazon Cognito identity pool when you want to authorize authenticated or anonymous users to access your AWS resources.
Identity pools use both role-based (RBAC) and attribute-based access control (ABAC) to manage your users’ authorization to access your AWS resources.
An identity pool can accept authenticated claims directly from both workforce and consumer identity providers.
The token that your identity pool creates for the identity can retrieve temporary session credentials from AWS Security Token Service (AWS STS).

Amazon Cognito user pool SP | Exchange an ID token from your user pool for web identity credentials from AWS STS SAML 2.0 SP | Exchange SAML assertions for web identity credentials from AWS STS OIDC SP | Exchange OIDC tokens for web identity credentials from AWS STS OAuth 2.0 SP | Exchange OAuth tokens from Amazon, Facebook, Google, Apple, and Twitter for web identity credentials from AWS STS Custom SP | With AWS credentials, exchange claims in any format for web identity credentials from AWS STS Unauthenticated access | Issue limited-access web identity credentials from AWS STS without authentication Role-based access control | Choose an IAM role for your authenticated user based on their claims, and configure your roles to only be assumed in the context of your identity pool Attribute-based access control | Convert claims into principal tags for your AWS STS temporary session, and use IAM policies to filter resource access based on principal tags

An Amazon Cognito user pool can also fulfill a dual role as a service provider (SP) to your IdPs, and an IdP to your app.
When to choose what ?

Usecases

Authenticate with a user pool
Access your server-side resources with a user pool
Access resources with API Gateway and Lambda with a user pool
Access AWS services with a user pool and an identity pool
Authenticate with a third party and access AWS services with an identity pool
Access AWS AppSync resources with Amazon Cognito

Configuring User pools in Cognito

Configuring Identity pools in Cognito

Lambda Triggers

You can create a Lambda function and then activate that function during user pool operations such as user sign-up, confirmation, and sign-in (authentication) with a Lambda trigger. You can add authentication challenges, migrate users, and customize verification messages.
When you have a Lambda trigger assigned to your user pool, Amazon Cognito interrupts its default flow to request information from your function. Amazon Cognito generates a JSON event and passes it to your function. The event contains information about your user's request to create a user account, sign in, reset a password, or update an attribute. Your function then has an opportunity to take action, or to send the event back unmodified.
Except for Custom Sender Lambda triggers, Amazon Cognito invokes Lambda functions synchronously. When Amazon Cognito calls your Lambda function, it must respond within 5 seconds. If it doesn't and if the call can be retried, Amazon Cognito retries the call. After three unsuccessful attempts, the function times out. You can't change this five-second timeout value.

Cognito Hosted UI

Cognito has a hosted authentication UI that you can add to your app to handle sign-up and sign-in workflows
Using the hosted UI, you have a foundation for integration with social logins, OIDC or SAML
Can customize with a custom logo and custom CSS
The hosted UI sign-in webpage uses the following URL format. Note the response_type. In this case, response_type=code for the authorization code grant.
When you navigate to the /oauth2/authorize endpoint with your custom parameters, Amazon Cognito either redirects you to the /oauth2/login endpoint or, if you have an identity_provider or idp_identifier parameter, silently redirects you to your IdP sign-in page.
https://<your_domain>/oauth2/authorize?response_type=code&client_id=<your_app_client_id>&redirect_uri=<your_call
You can view the hosted UI sign-in webpage with the following URL for the implicit code grant where response_type=token. After a successful sign-in, Amazon Cognito returns user pool tokens to your web browser's address bar.
https://<your_domain>/login?response_type=token&client_id=<your_app_client_id>&redirect_uri=<your_callback_url>
You can find the JSON web token (JWT) identity token after the #idtoken= parameter in the response.
Here's a sample response from an implicit grant request. Your identity token string will be much longer.
https://www.example.com/#id_token=123456789tokens123456789&expires_in=3600&token_type=Bearer
The Amazon Cognito hosted UI doesn't support custom cross-origin resource sharing (CORS) origin policies. A CORS policy in the hosted UI would prevent users from passing authentication parameters in their requests. Instead, implement a CORS policy in the web frontend of your app.

Tokens

After your app user successfully signs in, Amazon Cognito creates a session and returns an ID, access, and refresh token for the authenticated user.
The ID token is a JSON Web Token (JWT) that contains claims about the identity of the authenticated user, such as name, email, and phone_number
The signature of the ID token is calculated based on the header and payload of the JWT token. Before you accept the claims in any ID token that your app receives, verify the signature of the token.
The user pool access token contains claims about the authenticated user, a list of the user's groups, and a list of scopes. The purpose of the access token is to authorize API operations.
You can use the refresh token to retrieve new ID and access tokens. By default, the refresh token expires 30 days after your application user signs into your user pool. When you create an application for your user pool, you can set the application's refresh token expiration to any value between 60 minutes and 10 years.
With refresh tokens, you can persist users' sessions in your app for a long time. Over time, your users might want to deauthorize some devices where they have signed in, continually refreshing their session. To sign your user out from a single device, revoke their refresh token.
GlobalSignOut accepts a user's valid–unaltered, unexpired, not-revoked–access token. Because this API is token-authorized, one user can't use it to initiate sign-out for another user.
You can, however, generate an AdminUserGlobalSignOut API request that you authorize with your AWS credentials to sign out any user from all of their devices.
Before you can revoke a token for an existing user pool client, you must enable token revocation.
For verification of JWT tokens: Verifying the Token signature (from JWK url, public, private key pair) and Verifying the token claims: exp, aud, client_id, iss etc...
You can cache the access tokens so that your app only requests a new access token if a cached token is expired. Otherwise, your caching endpoint returns a token from the cache. This prevents an additional call to an Amazon Cognito API endpoint.
caching proxy with API Gateway: The cache key is a combination of the OAuth scopes that you request in the scope URL parameter and the Authorization header in the request. The Authorization header contains your app client ID and client secret.

ALB Flow with Cognito
ALB Flow with any OIDC ID provider
Cognito Identity pool with Social providers (user pool)
Authentication + Authorization

Cognito Sync/AppSync

Amazon Cognito Sync is an AWS service and client library that makes it possible to sync application-related user data across devices. Amazon Cognito Sync can synchronize user profile data across mobile devices and the web without using your own backend.

KMS

How SSL/TLS works and why encryption ?

SSL/TLS

An SSL/TLS certificate is a digital object that allows systems to verify the identity & subsequently establish an encrypted network connection to another system using the Secure Sockets Layer/Transport Layer Security (SSL/TLS) protocol.
PKI provides a way for one party to establish the identity of another party using certificates if they both trust a third-party - known as a certificate authority.
A certificate authority (CA) is an organization that sells SSL/TLS certificates to web owners, web hosting companies, or businesses. The CA validates the domain and owner details before issuing the SSL/TLS certificate. EX. Amazon Trust Services
An SSL/TLS certificate has a maximum validity period of 13 months.
A session key maintains encrypted communication between the browser and web server after the initial SSL/TLS authentication is completed. The session key is a cipher key for symmetric cryptography. Symmetric cryptography uses the same key for both encryption and decryption.
Encryption in flight ensures no MITM (man in the middle attack) can happen

Client-side encryption

Data is encrypted by the client and never decrypted by the server
Data will be decrypted by a receiving client
The server should not be able to decrypt the data
Could leverage Envelope Encryption

Server-side encryption

Data is encrypted after being received by the server
Data is decrypted before being sent
It is stored in an encrypted form thanks to a key (usually a data key)
The encryption / decryption keys must be managed somewhere and the server must have access to it
Able to audit KMS Key usage using CloudTrail
Never ever store your secrets in plaintext, especially in your code!, Encrypted secrets can be stored in the code / environment variables

AWS KMS KEYS

An AWS KMS key is a logical representation of a cryptographic key. A KMS key contains metadata, such as the key ID, key spec, key usage, creation date, description, and key state. Most importantly, it contains a reference to the key material that is used when you perform cryptographic operations with the KMS key.
Key material is the string of bits used in a cryptographic algorithm. Secret key material must be kept secret to protect the cryptographic operations that use it. Public key material is designed to be shared.
AWS KEY MATERIAL TYPES: AWS_KMS (MANAGED BY AWS), EXTERNAL (IMPORTED KEY MATERIAL FROM OUTSIDE OF AWS), AWS_CLOUDHSM (MANAGED BY AWS IN AWSCLOUDHSM CLUSTER), EXTERNAL_KEY_STORE (EXTERNAL KEY MANAGED OUTSIDE OF AWS)

Customer Managed KMS

The KMS keys that you create are customer managed keys. Customer managed keys (CMK) are KMS keys in your AWS account that you create, own, and manage. You have full control over these KMS keys, including establishing and maintaining their key policies, IAM policies, and grants, enabling and disabling them, rotating their cryptographic material, adding tags, creating aliases that refer to the KMS keys, and scheduling the KMS keys for deletion.
For customer managed keys, the value of the KeyManager field of the DescribeKey response is CUSTOMER.
Customer managed keys incur a monthly fee and a fee for use in excess of the free tier.

AWS Managed KMS

AWS managed keys are KMS keys in your account that are created, managed, and used on your behalf by an AWS service
You don't have to create or maintain the key or its key policy, and there's never a monthly fee for an AWS managed key.
you cannot change any properties of AWS managed keys, rotate them, change their key policies, or schedule them for deletion.
You can also identify AWS managed keys by their aliases, which have the format aws/service-name, such as aws/redshift
For AWS managed keys, the value of the KeyManager field of the DescribeKey response is AWS.
All AWS managed keys are automatically rotated every year. You cannot change this rotation schedule.

Identifying AWS KMS TYPES FROM AWS CONSOLE

Symmetric encryption KMS keys

When you create an AWS KMS key, by default, you get a KMS key for symmetric encryption
a symmetric encryption KMS key represents a 256-bit AES-GCM encryption key, except in China Regions, where it represents a 128-bit SM4 encryption key.
Symmetric encryption keys are used in symmetric encryption, where the same key is used for encryption and decryption.

Asymmetric KMS keys

An asymmetric KMS key represents a mathematically related public key and private key pair.
The private key never leaves AWS KMS unencrypted. To use the private key, you must call AWS KMS. You can use the public key within AWS KMS by calling the AWS KMS API operations, or you can download the public key and use it outside of AWS KMS
You can create asymmetric KMS keys that represent RSA key pairs or SM2 key pairs (China Regions only) for public key encryption or signing and verification, or elliptic curve key pairs for signing and verification.

HMAC KMS key (symmetric)

Represents a symmetric key of varying length that is used to generate and verify hash-based message authentication codes. The key material in an HMAC KMS key never leaves AWS KMS unencrypted. To use your HMAC KMS key, you must call AWS KMS.

Identifying types of keys from AWS console

To determine whether a KMS key is symmetric or asymmetric, use the DescribeKey operation. The KeySpec field in the response contains the key spec of the KMS key. For a symmetric encryption KMS key, the value of KeySpec is SYMMETRIC_DEFAULT. Other values indicate an asymmetric KMS key or an HMAC KMS key.

Use Cases for choosing types of Keys

Encrypt and decrypt data: If your use case requires encryption outside of AWS by users who cannot call AWS KMS, asymmetric KMS keys are a good choice. Otherwise Symmetric keys are good (fast, efficient, and assures the confidentiality and authenticity of data).
Sign messages and verify signatures: To sign messages and verify signatures, you must use an asymmetric KMS key.
Perform public key encryption: To perform public key encryption, you must use an asymmetric KMS key with an RSA key spec or an SM2 key spec (China Regions only). To encrypt data in AWS KMS with the public key of a KMS key pair, use the Encrypt operation. You can also download the public key and share it with the parties that need to encrypt data outside of AWS KMS.
Generate and verify HMAC codes: To generate and verify hash-based message authentication codes, use an HMAC KMS key.
Use with AWS services: AWS services that encrypt your data require a symmetric encryption KMS key..

Rotating Keys

To create new cryptographic material for your customer managed keys, you can create new KMS keys, and then change your applications or aliases to use the new KMS keys. Or, you can enable automatic key rotation for an existing KMS key.
However, automatic key rotation has no effect on the data that the KMS key protects. It does not rotate the data keys that the KMS key generated or re-encrypt any data protected by the KMS key, and it will not mitigate the effect of a compromised data key.
AWS KMS supports automatic key rotation only for symmetric encryption KMS keys with key material that AWS KMS creates.

Policies and Access control for AWS KMS

Default KMS Key Policy: Complete access to the key to the root user = entire AWS account
No AWS principal has any permissions to a KMS key unless that permission is provided explicitly and never denied.
AWS KMS resource policies for KMS keys are called key policies. All KMS keys have a key policy.
Combination of all can be used: Key policy, IAM policy, grants
EX.
Above ex. description:
Allows the example AWS account, 111122223333, full access to the KMS key. It allows the account and its administrators, including the account root user (for emergencies), to use IAM policies in the account to allow access to the KMS key.
Allows the ExampleAdminRole IAM role to administer the KMS key.
Allows the ExampleUserRole IAM role to use the KMS key.
You can allow users or roles in a different AWS account to use a KMS key in your account. Cross-account access requires permission in the key policy of the KMS key and in an IAM policy in the external user's account.
The key policy for the KMS key must give the external account (or users and roles in the external account) permission to use the KMS key. The key policy is in the account that owns the KMS key.
IAM policies in the external account must delegate the key policy permissions to its users and roles. These policies are set in the external account and give permissions to users and roles in that account.
Use case example

Using KMS with AWS services

Amazon S3 integrates with AWS Key Management Service (AWS KMS) to provide server-side encryption of Amazon S3 objects. Amazon S3 uses AWS KMS keys to encrypt your Amazon S3 objects.
Secrets Manager integrates with AWS Key Management Service (AWS KMS) to encrypt every version of every secret value with a unique data key that is protected by an AWS KMS key. This integration protects your secrets under encryption keys that never leave AWS KMS unencrypted.
With encryption at rest, DynamoDB transparently encrypts all customer data in a DynamoDB table, including its primary key and local and global secondary indexes, whenever the table is persisted to disk.

Region Snapshots in KMS

KMS Keys are region scoped, for ex. EBS volumes encrypted with KMS keys in a region if needs to be copied snapshots to another region, then following needs to be done
- Copy the EBS volume encrypted to another region which means AWS will re-encrypt the new snapshot with a different key as same key can't be used in two different regions.

ENCRYPT AND DECRYPT API

aws kms encrypt --key-id alias/tutorial --plaintext fileb://ExampleSecretFile.txt --output text --query CiphertextBlob --region eu-west-2 > ExampleSecretFileEncrypted.base64
cat ExampleSecretFileEncrypted.base64 | base64 --decode > ExampleSecretFileEncrypted
aws kms decrypt --ciphertext-blob fileb://ExampleSecretFileEncrypted --output text --query Plaintext > ExampleFileDecrypted.base64 --region eu-west-2
cat ExampleFileDecrypted.base64 | base64 --decode > ExampleFileDecrypted.txt

ENVELOPE ENCRYPTION

since KMS has upper limit of 4KB data for encrypt, anything over 4 KB of data that needs to be encrypted must use the Envelope Encryption
The key used to encrypt data itself is called a data encryption key (DEK).
The DEK is encrypted (also known as wrapped) by a key encryption key (KEK). The process of encrypting a key with another key is known as envelope encryption.
Encryption and Decryption is done on client side with help of DEK and KEK.
Flow of this encryption with new API
GenerateDataKey API: This operation returns a plaintext copy of the data key and a copy that is encrypted under a symmetric encryption KMS key that you specify.
Decrypt API:
GenerateDataKeyWithoutPlaintext API: GenerateDataKeyWithoutPlaintext is identical to the GenerateDataKey operation except that it does not return a plaintext copy of the data key. This operation is useful for systems that need to encrypt data at some point, but not immediately.
GenerateRandom API : Returns a random byte string that is cryptographically secure. You must use the NumberOfBytes parameter to specify the length of the random byte string. There is no default value for string length.

Encryption SDK

The AWS Encryption SDK is a client-side encryption library designed to make it easy for everyone to encrypt and decrypt data
Data key caching stores data keys and related cryptographic material in a cache. When you encrypt or decrypt data, the AWS Encryption SDK looks for a matching data key in the cache. If it finds a match, it uses the cached data key rather than generating a new one. Data key caching can improve performance, reduce cost, and help you stay within service limits as your application scales.
TradeOff between security and cost/usage
Uses LocalCryptoMaterialsCache(max age, max bytes, max number of messages)
Once installed SDK, commands with aws-encryption-cli can be used to perform encryption and decryption on client-side.

How to manage KMS Request Quotas

They vary based on region and type of CMK (like symmetric or assymetric) used in the request. It can be shared for one account across the regions.
Use exponential backoff
Use envelope encryption To reduce calls
Request a increase quota when experiencing ThrottlingException 400 bad requests, via API or Support Request to AWS.

How to Encrypt secrets used in the code

For ex. DB passwords injected via environment variable can be encrypted
LAMBDA FUNCTION WITH KMS

S3 Bucket SSE-KMS

Example of envelope encryption and how to avoid lot of KMS calls and high bills at scale -> S3 bucket keys which generates lot of data keys and encrypt the data , reducing the direct calls to KMS
less KMS CloudTrail events in CloudTrail
server-side encryption
request costs by up to 99 percent

CloudWatch logs encryption with KMS

Encryption can be enabled at log-group level by associating a CMK with log-group level via cloudwatch logs API (can't be done from console).

CloudHSM (hardware security modules)

Generate and use cryptographic keys on dedicated FIPS 140-2 Level 3 single-tenant HSM instances.
can be Integrated with KMS with option of custom key store
Highly available via CLOUDHSM clusters deployed across the regions
You do not use AWS Identity and Access Management (IAM) users or IAM policies to access resources within your cluster. Instead, you use HSM users directly on HSMs in your AWS CloudHSM cluster.

SSM (simple systems manager) Parameter store

provides secure, hierarchical storage for configuration data management and secrets management
You can store data such as passwords, database strings, Amazon Machine Image (AMI) IDs, and license codes as parameter values. You can store values as plain text or encrypted data
You can reference Systems Manager parameters in your scripts, commands, SSM documents, and configuration and automation workflows by using the unique name that you specified when you created the parameter.
You can configure change notifications and invoke automated actions for both parameters and parameter policies. These events are recieved by EventBridge.
Parameter Store is integrated with AWS Secrets Manager so that you can retrieve Secrets Manager secrets
Parameter Store provides support for three types of parameters: String (any text data), StringList(comma separated list), and SecureString (ecnrypted confidential data such as passwords etc..)
You restrict access to AWS Systems Manager parameters by using AWS Identity and Access Management (IAM). More specifically, you create IAM policies that restrict access
You can change a standard parameter to an advanced parameter at any time, but you can’t revert an advanced parameter to a standard parameter. This is because reverting an advanced parameter to a standard parameter would cause the system to truncate the size of the parameter from 8 KB to 4 KB, resulting in data loss.
EventBridge can be configured via EventBridge rule that invokes a target based on events that happen to one or more parameters in your AWS account
For ex. When you create an advanced parameter, you specify when a parameter expires, when to receive notification before a parameter expires
how long to wait before notification should be sent that a parameter hasn't changed
In the lambda functions, we can simply use boto3 client library to use SSM and call its methods to read the secrets directly from SSM store and avoid hardcoding the credentials. Note: lambda function will also require permissions via IAM to read/update/delete parameters in parameter store.
For SecureString string type, we can use flag withDecryption=True and permissions to access KMS keys, lambda function will be able to also decrypt the value stored in parameter store.

Secrets manager

Best place to store secrets like DB credentials, Oauth Tokens, certificates etc.. similar to hashicorp Vault.
Force rotation of secrets
Generation of secrets can be done on rotation (via lambda)
Ex. rotate AWS RDS DB credentials
A secret has versions which hold copies of the encrypted secret value. When you change the secret value, or the secret is rotated, Secrets Manager creates a new version
Secrets are encrypted with KMS (mandatory)
Multi region replication for secrets for disaster recovery
Since the credentials are no longer stored with the application, rotating credentials no longer requires updating your applications and deploying changes to application clients.
A secret contains JSON key value pairs + metadata about the secret like ARN, a description, a resource policy, and tags etc...
Code can directly access the secrets from secret manager by assuming the IAM role RoleToRetrieveSecretAtRuntime
Can be integrated with CloudFormation templates where it can be entirely managed by for ex. RDS DB including rotation mechanism or it can be generated by us in template and dynamically referenced in RDS resource (very similar to helm charts).
From CodeBuild environment, we can configure environment variables including secrets and specify it to be fetched from secrets managed, SSM Parameter store etc...

AWS Nitro Enclaves

AWS Nitro Enclaves is an Amazon EC2 feature that allows you to create isolated execution environments, called enclaves, from Amazon EC2 instances. Enclaves are separate, hardened, and highly-constrained virtual machines.
They provide only secure local socket connectivity with their parent instance. They have no persistent storage, interactive access, or external networking
Process highly sensitive data in an isolated compute environment: Personally Identifiable Information (PII), healthcare, financial, secure MUlti party computations etc...
Nitro Enclaves also supports an attestation feature, which allows you to verify an enclave's identity and ensure that only authorized code is running inside it. Nitro Enclaves is integrated with the AWS Key Management Service

Sanitizing Sensitive data

Data containing any PII (Personally identifiable information) should be sanitized i.e. encrypted which can be done at various levels.
For ex. while entering the requests from cloudFront servers, lambda function can intercept the requests and perform encryption of sensitive fields or entire payload and then decrypt while returning back the same information (field level encryption).

AWS Certificate management**

AWS Certificate Manager (ACM) handles the complexity of creating, storing, and renewing public and private SSL/TLS X.509 certificates and keys that protect your AWS websites and applications.
A certificate authority (CA) is an entity that issues digital certificates. The CA issues signed digital certificates that affirm the identity of the certificate subject and bind that identity to the public key contained in the certificate. A CA also typically manages certificate revocation.
A public key infrastructure (PKI) consists of hardware, software, people, policies, documents, and procedures that are needed to create, issue, manage, distribute, use, store, and revoke digital certificates.
A certificate authority (CA) typically exists within a hierarchical structure that contains multiple other CAs with clearly defined parent-child relationships between them. Child or subordinate CAs are certified by their parent CAs, creating a certificate chain. The CA at the top of the hierarchy is referred to as the root CA, and its certificate is called the root certificate. This certificate is typically self-signed.

AWS Public certificates

This service is for enterprise customers who need a secure web presence using TLS. ACM certificates are deployed through Elastic Load Balancing, Amazon CloudFront, Amazon API Gateway
ACM certificates are X.509 SSL/TLS certificates that bind the identity of your website and the details of your organization to the public key that is contained in the certificate. ACM uses your AWS KMS key to encrypt the private key.

AWS PRIVATE CA

This service is for enterprise customers building a public key infrastructure (PKI) inside the AWS cloud and intended for private use within an organization.
you can create your own certificate authority (CA) hierarchy and issue certificates with it for authenticating users, computers, applications, services, servers, and other devices.
Certificates issued by a private CA cannot be used on the internet.

To generate an SSH key pair, run the command ssh-keygen. ssh-keygen

Generate a CSR: openssl req –out certificatesigningrequest.csr -new -newkey rsa:2048 -nodes -keyout privatekey.key
Decode CSR: openssl req -in server.csr -noout –text
Generate CSR For existing private key: openssl req -out CSR.csr -key privateKey.key -new
Generate a CSR for an Existing Certificate and Private Key: openssl x509 -x509toreq -in certificate.crt -out CSR.csr -signkey privateKey.key
Generate Self signed certificate: openssl req -newkey rsa:2048 -nodes -keyout domain.key-x509 -days 365 -out domain.crt

Deployment

ELASTIC BEANSTALK

With Elastic Beanstalk, you can quickly deploy and manage applications in the AWS Cloud without having to learn about the infrastructure that runs those applications
Elastic Beanstalk supports applications developed in Go, Java, .NET, Node.js, PHP, Python, and Ruby. When you deploy your application, Elastic Beanstalk builds the selected supported platform version and provisions one or more AWS resources, such as Amazon EC2 instances, to run your application.
There is no additional charge for Elastic Beanstalk. You pay only for the underlying AWS resources that your application consumes.
In addition to the Elastic Beanstalk console, you can use the following tools to create and manage Elastic Beanstalk environments: EB CLI, SDK IN programming languages like JAVA, JS ETC...
Under the hood, Elastic Beanstalk relies on CloudFormation

Environments

Web server env / worker env
Every environment has a CNAME (URL) that points to a load balancer. The environment has a URL, such as myapp.us-west-2.elasticbeanstalk.com.
This URL is aliased in Amazon Route 53 to an Elastic Load Balancing URL—something like abcdef-123456.us-west-2.elb.amazonaws.com—by using a CNAME record.

Worker Environment

If your AWS Elastic Beanstalk application performs operations or workflows that take a long time to complete, you can offload those tasks to a dedicated worker environment.
Decoupling your web application front end from a process that performs blocking operations is a common way to ensure that your application stays responsive under load.
With periodic tasks, you can also configure the worker daemon to queue messages based on a cron schedule.
Supports DLQ with SQS queues

Web environment

Environment Types

Single-instance environment
- A single-instance environment contains one Amazon EC2 instance with an Elastic IP address.
Load-balanced, scalable environment
- Elastic Load Balancing and Amazon EC2 Auto Scaling services to provision the Amazon EC2 instances that are required for your deployed application.

Environment configuration

Deployment options

All at once
- Suitable if you can accept a short loss of service, and if quick deployments are important to you.
- With this method, Elastic Beanstalk deploys the new application version to each instance
Rolling
- Avoids downtime and minimizes reduced availability, at a cost of a longer deployment time.
- With this method, your application is deployed to your environment one batch of instances at a time.
- Suitable if you can't accept any period of completely lost service.
Rolling with additional batch
- Avoids any reduced availability, at a cost of an even longer deployment time
- Elastic Beanstalk launches an extra batch of instances, then performs a rolling deployment. Launching the extra batch takes time, and ensures that the same bandwidth is retained throughout the deployment.
Immutable
- A slower deployment method, that ensures your new application version is always deployed to new instances, instead of updating existing instances.
- a second Auto Scaling group is launched in your environment and the new version serves traffic alongside the old version until the new instances pass health checks.
Traffic splitting
- A canary testing deployment method.
- Suitable if you want to test the health of your new application version using a portion of incoming traffic, while keeping the rest of the traffic served by the old application version.
Blue/Green
- Zero downtime and release facility
- Create a new “stage” environment and deploy v2 there
- The new environment (green) can be validated independently and roll back if issues
- Using Beanstalk, “swap URLs” when done with the environment test

Beanstalk Lifecycle Policy

Elastic Beanstalk can store at most 1000 application versions
By default, Elastic Beanstalk leaves the application version's source bundle in Amazon S3 to prevent loss of data.
RDS with Elastic Beanstalk:
- This is not great for prod as the database lifecycle is tied to the Beanstalk environment lifecycle
- The best for prod is to separately create an RDS database and provide our EB application with the connection string
- You can choose what you want to happen to the database after you decouple it from your Elastic Beanstalk environment. Snapshot, Delete, Retain.

EB extensions

You can add AWS Elastic Beanstalk configuration files (.ebextensions) to your web application's source code to configure your environment and customize the AWS resources that it contains.
YAML / JSON format
Resources managed by .ebextensions get deleted if the environment goes away
You can use the option_settings key to modify the Elastic Beanstalk configuration and define variables that can be retrieved from your application using environment variables.
You can use the Resources key in a configuration file to create and customize AWS resources in your environment.
The resources that Elastic Beanstalk creates for your environment have names. You can use these names to get information about the resources with a function, or modify properties on the resources to customize their behavior.

EB cloning

You can use an existing Elastic Beanstalk environment as the basis for a new environment by cloning the existing environment.
Useful for deploying a “test” version of your application
during the cloning process, Elastic Beanstalk doesn't copy data from Amazon RDS to the clone
environment variables are preserved
load balancer configuration is preserved
Migrating load balancer
- after deployment to new environment, perform a CNAME swap or Route 53 update

CLOUFORMATION

Infrastructure as Code
AWS CloudFormation is a service that helps you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS
Reuse your CloudFormation template to create your resources in a consistent and repeatable manner (multi region deployments etc..)
Because these templates are text files, you simply track differences in your templates to track changes to your infrastructure, similar to the way developers control revisions to source code.
Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you
Use the AWS CloudFormation Designer or your own text editor to create or modify a CloudFormation template in JSON or YAML format.
Custom resources enable you to write custom provisioning logic in templates that AWS CloudFormation runs anytime you create, update (if you changed the custom resource), or delete stacks. For example, you might want to include resources that aren't available as AWS CloudFormation resource types.

Templates

A CloudFormation template is a JSON or YAML formatted text file.
you can add input parameters whose values are specified when you create a CloudFormation stack.
Resources are the core of your CloudFormation template
They represent the different AWS Components that will be created and configured
Properties: Resource declarations use a Properties attribute to specify the information used to create a resource.
Intrinsic functions: CloudFormation has a number of intrinsic functions that you can use to refer to other resources and their properties.
- You can use the Ref function to refer to an identifying property of a resource. Frequently, this is the physical name of the resource; however, sometimes it can be an identifier, such as the IP address for an AWS::EC2::EIP resource or an Amazon Resource Name (ARN) for an Amazon SNS topic.
- Parameters: The Ref function can refer to input parameters that are specified at stack creation time.
- You specify a Systems Manager parameter key as the value of the SSM parameter, and AWS CloudFormation fetches the latest value from Parameter Store to use for the stack.
- A number of resources have additional attributes whose values you can use in your template. To get these attributes, you use the Fn::GetAtt function.
- The Fn::GetAtt function takes two parameters, the logical name of the resource and the name of the attribute to be retrieved.
- There may be settings that are region dependent or are somewhat complex for users to figure out because of other conditions or dependencies.
- In these cases, you would want to put some logic in the template itself so that users can specify simpler values (or none at all) to get the results that they want.
- There are two template features that can help, the Mappings object and the AWS::Region pseudo parameter.
- To use a map to return a value, you use the Fn::FindInMap function, passing the name of the map, the value used to find the mapped value, and the label of the mapped value you want to return
- Multiple values
- You can use an input parameter with the Fn::FindInMap function to refer to a specific value in a map
- There can be situations where a value from a parameter or other resource attribute is only part of the value you need.
- The Fn::Join function takes two parameters, a delimiter that separates the values you want to concatenate and an array of values in the order that you want them to appear.
The Outputs object in the template contains declarations for the values that you want to have available after the stack is created.
Fn::ImportValue : cross stack reference
Conditions: Conditions are used to control the creation of resources or outputs based on a condition
Fn::And, Fn::Equals, Fn::If, Fn::Not, Fn::Or
Fn::Sub: r !Sub as a shorthand, is used to substitute variables from a text

ChangeSets

Before making changes to your resources, you can generate a change set, which is a summary of your proposed changes.
Change sets allow you to see how your changes might impact your running resources, especially for critical resources, before implementing them.

Stacks

you manage related resources as a single unit called a stack.
To create those resources, you create a stack by submitting the template that you created, and CloudFormation provisions all those resources for you.
With change sets, you can preview the changes AWS CloudFormation will make to your stack, and then decide whether to apply those changes. Change sets are JSON-formatted documents that summarize the changes AWS CloudFormation will make to a stack.
Update behaviours
- Update with No Interruption
- Updates with Some Interruption
- Replacement
Stack notifications
Stack policies
- A Stack Policy is a JSON document that defines the update actions that are allowed on specific resources during Stack updates
- Protect resources from unintentional updates
- When you set a Stack Policy, all resources in the Stack are protected by default
- Specify an explicit ALLOW for the resources you want to be allowed to be updated
Stack failure options
- You can provision failure options for all stack deployments and change set operations.
- Default: everything rolls back (gets deleted)
- Preserve successfully provisioned resources preserves the state of successful resources, while failed resources will stay in a failed state until the next update operation is performed.
Detect stack drift
- Drift detection enables you to detect whether a stack's actual configuration differs, or has drifted, from its expected configuration.
- Use CloudFormation to detect drift on an entire stack, or on individual resources within the stack
- You can perform drift detection on stacks with the following statuses: CREATE_COMPLETE, UPDATE_COMPLETE, UPDATE_ROLLBACK_COMPLETE, and UPDATE_ROLLBACK_FAILED
Nested stacks
- Nested stacks are stacks created as part of other stacks. You create a nested stack within another stack by using the AWS::CloudFormation::Stack resource.
- For example, assume that you have a load balancer configuration that you use for most of your stacks. Instead of copying and pasting the same configurations into your templates, you can create a dedicated template for the load balancer. Then, you just use the resource to reference that template from within other templates.
StackSets
- AWS CloudFormation StackSets extends the capability of stacks by enabling you to create, update, or delete stacks across multiple accounts and AWS Regions with a single operation
- An administrator account is the AWS account in which you create stack sets.
- A stack set lets you create stacks in AWS accounts across regions by using a single CloudFormation template
Cross stacks vs nested stacks

CDK (cloud developement kit)

The AWS Cloud Development Kit (AWS CDK) lets you define your cloud infrastructure as code in one of its supported programming languages.
An AWS CDK app is an application written in TypeScript, JavaScript, Python, Java, C# or Go that uses the AWS CDK to define AWS infrastructure.
An app defines one or more stacks. Stacks (equivalent to AWS CloudFormation stacks) contain constructs. Each construct defines one or more concrete AWS resources, such as Amazon S3 buckets, Lambda functions, or Amazon DynamoDB tables.
The AWS CDK includes the CDK Toolkit (also called the CLI), a command line tool for working with your AWS CDK apps and stacks. Among other functions, the Toolkit provides the ability to do the following:
- Convert one or more AWS CDK stacks to AWS CloudFormation templates and related assets (a process called synthesis)
- Deploy your stacks to an AWS account and Region

Deploying infrastructure via CDK

Install aws-cdk-lib

Constructs

A construct represents a "cloud component" and encapsulates everything AWS CloudFormation needs to create the component.
A construct can represent a single AWS resource, such as an Amazon Simple Storage Service (Amazon S3) bucket. A construct can also be a higher-level abstraction consisting of multiple related AWS resources. Examples of such components include a worker queue with its associated compute capacity, or a scheduled job with monitoring resources and a dashboard.

AWS Construct library

L1 constructs
- CFN Resources
- They are named CfnXyz, where Xyz is name of the resource.
- For example, CfnBucket represents the AWS::S3::Bucket AWS CloudFormation resource.
- When you use Cfn resources, you must explicitly configure all resource properties. This requires a complete understanding of the details of the underlying AWS CloudFormation resource model.
L2 constructs
- higher-level, intent-based API.
- defaults, boilerplate, and glue logic you'd be writing yourself with a CFN Resource construct
- AWS constructs offer convenient defaults and reduce the need to know all the details about the AWS resources they represent.
- For example, the s3.Bucket class represents an Amazon S3 bucket with additional properties and methods, such as bucket.addLifeCycleRule(), which adds a lifecycle rule to the bucket.
L3 constructs
- patterns
- For example, the aws-ecs-patterns.ApplicationLoadBalancedFargateService construct represents an architecture that includes an AWS Fargate container cluster employing an Application Load Balancer. The aws-apigateway.LambdaRestApi construct represents an Amazon API Gateway API that's backed by an AWS Lambda function.
Composition is the key pattern for defining higher-level abstractions through constructs.

Apps

An App is a container for one or more stacks: it serves as each stack's scope. Stacks within a single App can easily refer to each others' resources (and attributes of those resources)
The call to app.synth() is what tells the AWS CDK to synthesize a cloud assembly from an app. Typically you don't interact directly with cloud assemblies.
cdk.json file

Stacks

The unit of deployment in the AWS CDK is called a stack. All AWS resources defined within the scope of a stack

Environment

AWS Environment = account & region
Each Stack instance in your AWS CDK app is explicitly or implicitly associated with an environment (env)

Bootstrapping

Bootstrapping is the process of provisioning resources for the AWS CDK before you can deploy AWS CDK apps into an AWS environment.
These resources include an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments.

Unit testing

Supports Jest (JS), Pytest (python)
Fine-grained assertions test specific aspects of the generated AWS CloudFormation template, such as "this resource has this property with this value."
Snapshot tests test the synthesized AWS CloudFormation template against a previously stored baseline template. Snapshot tests let you refactor freely, since you can be sure that the refactored code works exactly the same way as the original.
To import a template
- Template.fromStack(MyStack) : stack built in CDK
- Template.fromString(mystring) : stack build outside CDK

Best practices

AWS CI/CD

AWS Cloud9

integrated development environment, or IDE.
Working with code in several programming languages and the AWS Cloud Development Kit (AWS CDK), pair programming
Fully integrated with AWS SAM & Lambda to easily build serverless applications

AWS CodeCommit

AWS CodeCommit is a version control service hosted by Amazon Web Services that you can use to privately store and manage assets i.e. source code etc...
Connection
- Setup for HTTPS users using Git credentials
- Setup for SSH users not using the AWS CLI
Share code repo
- git-remote-codecommit: It is the recommended method for supporting connections made with federated access, identity providers, and temporary credentials. To assign permissions to a federated identity, you create a role and define permissions for the role. When a federated identity authenticates, the identity is associated with the role and is granted the permissions that are defined by the role.
- You cannot use Git credentials or SSH key pairs with federated access or identity providers
- use Git credentials or SSH key pairs with IAM users (important while selecting protocol to share cloning URL)
- Create IAM policies, IAM groups and add users to IAM groups
You can set up notification rules for a repository so that repository users receive emails about the repository event types you specify.
You can create an Amazon SNS topic to use for notifications
You can configure a CodeCommit repository so that code pushes or other events trigger actions: could be to SNS, Lambda etc.
Configure Cross account access
Ex. Account B group users needs to access Account A repository
Account A: Create a policy in AccountA that grants access to the repository, Create a role in AccountA that can be assumed by IAM users and groups in AccountB, Attach the policy to the role.
Account B: Create an IAM group for repository access for AccountB users, Create a policy and add users to the IAM group
Repository users: Configure the AWS CLI and Git for an AccountB user to access the repository in AccountA, Clone and access the CodeCommit repository in AccountA

AWS CodeBuild

A fully managed continuous integration (CI) service
CodeBuild compiles your source code, runs unit tests, and produces artifacts that are ready to deploy.
buildspec.yml: A buildspec is a collection of build commands and related settings, in YAML format, that CodeBuild uses to run a build.
Output logs can be stored in Amazon S3 & CloudWatch Logs
Use EventBridge to detect failed builds and trigger notifications
artifacts represents the set of build output artifacts that CodeBuild uploads to the output bucket.
For this, directory structure should look like this
A build project includes information about how to run a build, including where to get the source code, which build environment to use, which build commands to run, and where to store the build output.
A build environment represents a combination of operating system, programming language runtime, and tools that CodeBuild uses to run a build.
Phases:
- install: installing packages in the build environment.
- pre_build: you might use this phase to sign in to Amazon ECR, or you might install npm dependencies.
- build: CodeBuild runs during the build. For example, you might use this phase to run Mocha, RSpec, or sbt.
- post_build: you might use Maven to package the build artifacts into a JAR or WAR file, or you might push a Docker image into Amazon ECR. Then you might send a build notification through Amazon SNS.
Cache: Represents information about where CodeBuild can prepare the files for uploading cache to an S3 cache bucket.
Reports: Test reports, coverage reports, cucumber, Junit etc...
You can use the AWS CodeBuild agent to run CodeBuild builds on a local machine. There are agents available for x86_64 and ARM platforms.
Typically, AWS CodeBuild cannot access resources in a VPC. To enable access, you must provide additional VPC-specific configuration information in your CodeBuild project configuration.
This includes the VPC ID, the VPC subnet IDs, and the VPC security group IDs. VPC-enabled builds can then access resources inside your VPC.

AWS CodeDeploy

CodeDeploy is a deployment service that automates application deployments to Amazon EC2 instances, on-premises instances, serverless Lambda functions, or Amazon ECS services.
EC2/On-Premises, AWS Lambda, Amazon ECS
A deployment group is a set of individual instances. A deployment group contains individually tagged instances, Amazon EC2 instances in Amazon EC2 Auto Scaling groups, or both
Deployment configuration
- EC2/On-Premises compute platform
  - you can specify the minimum number of healthy instances for the deployment
  - All at once, Half at a time, One at a time (In-place/Blue-green)
- Lambda function or ECS
  - Canary: You can choose from predefined canary options that specify the percentage of traffic shifted to your updated Lambda function or ECS task set in the first increment and the interval, in minutes, before the remaining traffic is shifted in the second increment.
  - Linear: Traffic is shifted in equal increments with an equal number of minutes between each increment
  - All-at-once: All traffic is shifted from the original Lambda function or ECS task set to the updated function or task set all at once.
Deployment types
- In-place deployment: Only deployments that use the EC2/On-Premises compute platform can use in-place deployments. The application on each instance in the deployment group is stopped, the latest application revision is installed, and the new version of the application is started and validated.
- Blue/green deployment: Instances are provisioned for the replacement environment. The latest application revision is installed on the replacement instances. Instances in the replacement environment are registered with one or more Elastic Load Balancing load balancers, causing traffic to be rerouted to them
Workflow lambda
Workflow ECS
Workflow EC2
appspec.yaml
- An application specification file (AppSpec file), which is unique to CodeDeploy, is a YAML-formatted or JSON-formatted file. The AppSpec file is used to manage each deployment as a series of lifecycle event hooks, which are defined in the file.
- The AWS CodeDeploy agent is a software package that, when installed and configured on an instance, makes it possible for that instance to be used in CodeDeploy deployments.
- During deployment, the CodeDeploy agent looks up the name of the current event in the hooks section of the AppSpec file.
- The CodeDeploy agent is not used in an AWS Lambda or an Amazon ECS deployment.
Tagging instances for deployment groups
- Tags enable you to categorize your instances in different ways (for example, by purpose, owner, or environment).
- The criteria for instances in a deployment group can be as simple as a single tag in a single tag group.
Rollbacks
- Deployments can be rolled back:
  - Automatically – rollback when a deployment fails or rollback when a CloudWatch Alarm thresholds are met
  - Manually
- If a roll back happens, CodeDeploy redeploys the last known good revision as a new deployment (not a restored version)

AWS CodePipeline

AWS CodePipeline is a continuous delivery service you can use to model, visualize, and automate the steps required to release your software.
A pipeline is a workflow construct that describes how software changes go through a release process. Each pipeline is made up of a series of stages.
A stage is a logical unit you can use to isolate an environment and to limit the number of concurrent changes in that environment
Use CloudWatch Events (Amazon EventBridge). Example: • You can create events for failed pipelines • You can create events for cancelled stages
Events for code pipeline emitted
If CodePipeline fails a stage, your pipeline stops, and you can get information in the console
Stopping executions
Execution process
- Pipelines can process multiple executions at the same time. Each execution is run through the pipeline separately.
- The pipeline processes each execution in order and might supersede an earlier execution with a later one
CodePipeline takes care of inputs and outputs of each stages
- From codecommit, output artifact (any files to be built) from the Source stage.
- The output artifact (any files to be built) from the previous step is ingested as an input artifact to the Build stage. An output artifact (the built application) from the Build stage can be an updated application or an updated Docker image built to a container.
- The output artifact from the previous step (the built application) is ingested as an input artifact to the Deploy stage, such as staging or production environments in the AWS Cloud
Actions
- an action is part of the sequence in a stage of a pipeline. It is a task performed on the artifact in that stage.
- source, build, test, deploy, approval, and invoke
- Approval:
  - you can add an approval action to a stage in a pipeline at the point where you want the pipeline execution to stop so that someone with the required AWS Identity and Access Management permissions can approve or reject the action.
- CloudFormation integration

AWS CodeStar

AWS CodeStar is a cloud-based service for creating, managing, and working with software development projects on AWS.
AWS CodeStar also manages the permissions required for project users (called team members).
AWS CodeStar project templates allow you to start with a sample application and deploy it using AWS resources created to support your development project. When you choose an AWS CodeStar project template, the application type, programming language, and compute platform are provisioned for you.

AWS CodeArtifact

AWS CodeArtifact is a secure, highly scalable, managed artifact repository service
Every CodeArtifact repository is a member of a single CodeArtifact domain.
To add packages to a repository, configure a package manager such as npm or Maven to use the repository endpoint (URL). You can then use the package manager to publish packages to the repository.
Ex. Maven, Npm, NuGet etc...
Upstream repositories:
- A repository can have other AWS CodeArtifact repositories as upstream repositories. This enables a package manager client to access the packages that are contained in more than one repository using a single repository endpoint.
- If an upstream repository has an external connection to a public repository, the repositories that are downstream from it can pull packages from that public repository. For example, suppose that the repository my_repo has an upstream repository named upstream, and upstream has an external connection to a public npm repository. In this case, a package manager that is connected to my_repo can pull packages from the npm public repository.
- You can add up to 10 upstream repositories to a CodeArtifact repository. You can only add one external connection.
CodeArtifact behavior when an external repository is not available: CodeArtifact repository will continue to be available for download from CodeArtifact.
For a package version in a public repository such as npmjs.com to be available through a CodeArtifact repository, it must first be added to a Regional package metadata cache (delay when it's available)
Package retention
- CodeArtifact allows chaining upstream repositories. For example, repo-A can have repo-B as an upstream and repo-B can have repo-C as an upstream. This configuration makes the package versions in repo-B and repo-C available from repo-A.
- If a package manager connected to repo-A requests a package version, lodash 4.17.20 for example, and the package version is not present in any of the three repositories, it will be fetched from npmjs.com. When lodash 4.17.20 is fetched, it will be retained in repo-A as that is the most-downstream repository and repo-C as it has the external connection to npmjs.com attached. lodash 4.17.20 will not be retained in repo-B as that is an intermediate repository.
Events integration
CodeArtifact is integrated with Amazon EventBridge, a service that automates and responds to events, including changes in a CodeArtifact repository.
Cross account access
Domains
- You can use a domain to apply permissions across many repositories owned by different AWS accounts. An asset is stored only once in a domain, even if it's available from multiple repositories.
- You cannot create a repository without a domain

AWS CodeGuru

Amazon CodeGuru Reviewer is a service that uses program analysis and machine learning to detect potential defects that are difficult for developers to find and offers suggestions for improving your Java and Python code.
resource leak prevention or security analysis.
Reviewer and profiler
Helps understand the runtime behavior of your application
• Example: identify if your application is consuming excessive CPU capacity on a logging routine
Agent Configuration:

Troubleshooting and Optimization

CLOUDWATCH

Metrics
- Amazon CloudWatch is basically a metrics repository. An AWS service—such as Amazon EC2—puts metrics into the repository, and you retrieve statistics based on those metrics
- A namespace is a container for CloudWatch metrics. Metrics in different namespaces are isolated from each other, so that metrics from different applications are not mistakenly aggregated into the same statistics.
- The AWS namespaces typically use the following naming convention: AWS/service
- A metric represents a time-ordered set of data points that are published to CloudWatch.
- Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time
- Each metric data point must be associated with a time stamp
- A dimension is a name/value pair that is part of the identity of a metric
- For example, you can get statistics for a specific EC2 instance by specifying the InstanceId dimension when you search for metrics.
- Amazon CloudWatch aggregates statistics according to the period length that you specify when retrieving statistics
- You can use an alarm to automatically initiate actions on your behalf. An alarm watches a single metric over a specified time period, and performs one or more specified actions, based on the value of the metric relative to a threshold over time.
- By default, your instance is enabled for basic monitoring. You can optionally enable detailed monitoring. After you enable detailed monitoring, the Amazon EC2 console displays monitoring graphs with a 1-minute period for the instance.
- Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)
- Custom Metrics
  - You can publish your own metrics to CloudWatch using the AWS CLI or an API
  - Use API call PutMetricData
  - Use dimensions: For example, the following command publishes a Buffers metric with two dimensions named InstanceId and InstanceType.
  - Metric resolution (StorageResolution API parameter – two possible value): • Standard: 1 minute (60 seconds) • High Resolution: 1/5/10/30 second(s) – Higher cost
  - Note: Accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC2 instance time correctly)
- Anamoly detection can be enabled on metrics
Logs
- log groups: A log stream is a sequence of log events that share the same source.
- log streams: Each separate source of logs in CloudWatch Logs makes up a separate log stream.
- Can define log expiration policies (never expire, 1 day to 10 years…)
- log sources: SDK, CloudWatch Logs Agent, CloudWatch Unified Agent, Elastic Beanstalk, ECS, LAMBDA, VPC FLOW LOGS, API GATEWAY, ROUTE 53 DNS QUERIES, CLOUDTRAIL ETC...
- CLoudwatch log insights
  - CloudWatch Logs Insights enables you to interactively search and analyze your log data in Amazon CloudWatch Logs. You can perform queries to help you more efficiently and effectively respond to operational issues.
  - Can query multiple Log Groups in different AWS accounts
  - purpose-built query language
- Exporting log data to Amazon S3
  - Export log data from your log groups to an Amazon S3 bucket and use this data in custom processing and analysis, or to load onto other systems
  - Log data can take up to 12 hours to become available for export
  - The API call is CreateExportTask
  - Not real-time
- Subscriptions to cloudwatch events
  - You can use subscriptions to get access to a real-time feed of log events from CloudWatch Logs and have it delivered to other services such as an Amazon Kinesis stream, an Amazon Kinesis Data Firehose stream, or AWS Lambda for custom processing, analysis, or loading to other systems
  - A subscription filter defines the filter pattern to use for filtering which log events get delivered to your AWS resource, as well as information about where to send matching log events to.
  - Cross data subscriptions
- Log aggregation
- CloudWatch Unified Agent
  - By default, no logs from your EC2 machine will go to CloudWatch
  - You need to run a CloudWatch agent on EC2 to push the log files you want
  - Collect additional system-level metrics such as CPU, Disk memory, RAM, processes, etc…
- Metric filter
  - You can search and filter the log data coming into CloudWatch Logs by creating one or more metric filters. Metric filters define the terms and patterns to look for in log data as it is sent to CloudWatch Logs
  - For example, you can create a metric filter that counts the number of times the word ERROR occurs in your log events.
  - Filters do not retroactively filter data. Filters only publish the metric data points for events that happen after the filter was created.
  - Ability to specify up to 3 Dimensions for the Metric Filter
- Alarms
  - Alarms are used to trigger notifications for any metric
  - Alarm States: OK, INSUFFICIENT_DATA, ALARM
  - Period: Length of time in seconds to evaluate the metric
- Composite alarms:
  - Composite Alarms are monitoring the states of multiple other alarms
- Alarms can be created based on CloudWatch Logs Metrics Filters
CloudWatch Synthetics Canary
- use Amazon CloudWatch Synthetics to create canaries, configurable scripts that run on a schedule, to monitor your endpoints and APIs. Canaries follow the same routes and perform the same actions as a customer, which makes it possible for you to continually verify your customer experience even when you don't have any customer traffic on your applications.
- Scripts written in Node.js or Python
- Programmatic access to a headless Google Chrome browser
- Canaries check the availability and latency of your endpoints and can store load time data and screenshots of the UI. They monitor your REST APIs, URLs, and website content, and they can check for unauthorized changes from phishing, code injection and cross-site scripting.
- Some blueprints provided:
  - Heartbeat Monitor: load URL, store screenshot and an HTTP archive file
  - API Canary: basic Read and Write functions of a REST API.
  - Broken Link Checker: check all links inside the URL that you are testing
  - Visual Monitoring: compare a screenshot taken during a canary run with a baseline screenshot
  - Canary Recorder: used with CloudWatch Synthetics Recorder (record your actions on a website and automatically generates a script for that)
  - GUI Workflow Builder: verifies that actions can be taken on your webpage. For example, if you have a webpage with a login form, the canary can populate the user and password fields and submit the form to verify that the webpage is working correctly.

Cloudwatch events/EventBridge

Schedule: Cron jobs (scheduled scripts)
Event Pattern: Event rules to react to a service doing something
Trigger Lambda functions, send SQS/SNS messages…

How does eventbus works

An event bus is a router that receives events and delivers them to zero or more destinations,or targets.
At its simplest, an EventBridge event is a JSON object sent to an event bus or pipe.
EventBridge then evaluates the event against each rule defined for that event bus.
For each event that matches a rule, EventBridge then sends the event to the targets specified for that rule. Optionally, as part of the rule, you can also specify how EventBridge should transform the event prior to sending it to the target(s).
An event might match multiple rules, and each rule can specify up to five targets.
Event structure

Eventbridge rules

An event pattern, which contains one or more filters to match events. Event patterns can include filters that match on: Event metadata, Event data, Event content
A schedule to invoke the target(s) at regular intervals (regular rate/specific times)
By default, you can configure up to 300 rules per event bus.
You can archive events (all/filter) sent to an event bus (indefinitely or set period)
Ability to replay archived events