aws-solutions-architect-associate-notes

Compute services

EC2

EC2 (Elastic Compute Cloud) is an Infrastructure as a Service (IaaS)
Storage space:
- Network attached (EBS & EFS)
- Direct attached(Hardware) (EC2 Instance Store)
Firewall rules: security group
Static IPv4 addresses known as Elastic IP addresses
Bootstrap script (Execute only at first launch): EC2 User Data
MetaData is data about your EC2 instance
This can include information such as private IP address, public IP address, hostname, security groups, etc.
Url to fetch metadata about the instance (http://169.254.169.254/latest/meta-data)

IAM Roles for EC2 instances

Never enter AWS credentials into the EC2 instance, instead attach IAM Roles to the instances

EC2 Instance Types

You can use different types of EC2 instances that are optimised for different use cases\

General Purpose

Great for a diversity of workloads such as web servers or code repositories
Balance between Compute, Memory, Networking

Compute Optimize

Great for compute-intensive tasks that require high performance processors:
- Batch processing workloads
- Media transcoding
- High performance web servers
- High performance computing (HPC)
- Scientific modeling & machine learning
- Dedicated gaming servers

Memory Optimized

Fast performance for workloads that process large data sets in memory
Distributed web scale cache stores

Storage Optimized

Great for storage-intensive tasks that require high, sequential read and write access to large data sets on local storage
High frequency online transaction processing (OLTP) systems
Cache for in-memory databases (for example, Redis)
Data warehousing applications

Security group

Security groups are acting as a “firewall” on EC2 instances
They regulate:
- Access to Ports
- Authorised IP ranges – IPv4 and IPv6
- Control of inbound network (from other to the instance)
- Control of outbound network (from the instance to other)
All inbound traffic is blocked by default
All outbound traffic is allowed.
It’s good to maintain one separate security group for SSH access
If your application is not accessible (time out), then it’s a security group issue
If your application gives a “connection refused“ error, then it’s an application error or it’s not launched
You can't delete the default security group, however, you can change the default SG rules
You can assign up to five security groups to the instance.

EC2 Instances Purchasing Options

On Demand

Pay per use (no upfront payment)
Has the highest cost but no upfront payment
No long-term commitment
Recommended for short-term and un-interrupted workloads, where you can't predict how the application will behave

Reserved Instances

Predictable Usage, Applications with steady state or predictable usage
Specific Capacity Requirements, Applications that require reserved capacity.
Pay up Front You can make upfront payments to reduce the total computing costs even further
Standard RIs Up to 72% off the on-demand price.
Convertible RIs Up to 54% off the on-demand price. Can change the instance type
Scheduled RIs Launch within the time window you define. Match your capacity reservation to a predictable recurring schedule that
only requires a fraction of a day, week, or month.

EC2 Savings Plans

Get a discount based on long-term usage (up to 72% - same as RIs)
Commit to a certain type of usage ($10/hour for 1 or 3 years)
Usage beyond EC2 Savings Plans is billed at the On-Demand price
Flexible across: Instance size.
Commit to a certain type of usage ($10/hour for 1 or 3 years, OS, Tenancy(Host, Dedicated, Default)

EC2 Spot Instances

Can get a discount of up to 90% compared to On-demand
Instances that you can “lose” at any point of time if your max price is less than the current spot price
The MOST cost-efficient instances in AWS
Useful for workloads that are resilient to failure
Not suitable for critical jobs or databases
Spot Block“block” spot instance during a specified time frame (1 to 6 hours) without interruptions
You can only cancel Spot Instance requests that are open, active, or disabled.
To terminate spot instance You must first cancel a Spot Request, and then terminate the associated Spot Instances.

EC2 Dedicated Hosts

A physical server with EC2 instance capacity fully dedicated to your use
Allows you address compliance requirements and use your existing server- bound software licenses
The most expensive option
Useful for software that have complicated licensing mode
Or for companies that have strong regulatory or compliance needs

EC2 Capacity Reservations

Reserve On-Demand instances capacity in a specific AZ for any duration
You always have access to EC2 capacity when you need it
You’re charged at On-Demand rate whether you run instances or not

Elastic IP

If you need to have a fixed public IP for your instance, you need an Elastic IP
An Elastic IP is a public IPv4 IP you own as long as you don’t delete it
You can attach it to one instance at a time
With an Elastic IP address, you can mask the failure of an instance or software by rapidly remapping the address to another instance in your account.
You can only have 5 Elastic IP in your account

Placement Groups 3 types of placements groups : Cluster, Spread, Partition

Cluster

Grouping of instances within a single Availability Zone, same hardware
Recommended for applications that need low network latency, high network throughput, or both.

Spread

A spread placement group is a group of instances that are each placed on distinct underlying hardware.
Recommended for applications that have a small number of critical instances that should be kept separate from each other
Multi AZ, same region
max 7 instances per group per AZ
Reduce risk of simulataneous failure

Partition

Each partition placement group has its own set of racks. Each rack has its own network and power source.
Multiple AZs in the same region
Up to 7 partitions per AZ
Up to 100s of EC2 instances
Isolate from failure

Networking with EC2

You can attach 3 different types of virtual networking cards to your EC2.

ENI(Elastic Network Interface)

An ENI is simply a virtual network card that allows:
- Private IPv4 addresses
- Public Ipv4 Address
- Many IPV6
- One Elastic IP (IPv4) per private IPv4
- Mac Address
- 1 or More SG
You can create ENI independently and attach them on the fly (move them) on EC2 instances for failover
Bound to a specific availability zone (AZ)

EN(Enhanced Networking) _ For High Performance Networking between 10 Gbps to 100 Gbps

Single Root I/O Virtualization (SR-IOV) provides higher I/O performance and lower CPU utilization
Depending on your instance type, enhanced networking(EN) can be enable using :

1. ENA (Elastic Network Adapter) Supports network speeds of up to 100 Gbps for supported instances types
2. VF (Virtual Function) interface Supports network speeds of up to 10 Gbps for supported instance types Typically used on older instances

EFA(Elastic fabric Adapter)

For when you need to accelerate High Performance Computing (HPC) and machine learning applications
Or if you need to do an OS-bypass
OS-bypass enables HPC and machine learning applications to bypass the operating system kernel and communicate directly with the EFA device
Not currently supported with Windows — only Linux.

Hibernation

The in-memory (RAM) state is preserved
The instance boot is much faster! (the OS is not stopped / restarted)
Under the hood: the RAM state is written to a file in the root EBS volume
The root EBS volume must be encrypted
Instance RAM Size – must be less than 150 GB.

Lambda

Virtual functions – no servers to manage!
Limited by time - short executions
Run on-demand
Scaling is automated
Not good for running containerized applications

Lambda Limits

Execution
- Memory allocation: 128 MB – 10GB
- Maximum execution time: 900 seconds (15 minutes)
- Environment variables: 4KB
- Disk capacity in function container (/tmp): 512 MB to 10GB
- Concurrency executions: 1000 (can be increased)
Deployment
- Lambda function deployment size (compressed .zip): 50 MB
- Size of uncompressed deployment (code + dependencies): 250 MB
- Size of environment variables: 4 KB

Lambda@Edge

Deploy Lambda functions alongside your CloudFront CDN for computing at edge locations
Customize the CDN content using Lambda at the edge location (responsive)
No server management (Lambda is deployed globally)
Can be used to modify CloudFront requests & responses

Networking

By default, your Lambda function is launched outside your own VPC (in an AWS owned VPC)
Therefore, it cannot access resources in your VPC (RDS, ElastiCache, internal ELB…)
To enable your Lambda function to access resources inside your private VPC,
You must define the VPC ID, the Subnets and the Security Groups
Lambda will create an ENI (Elastic Network Interface) in your subnets

Elastic Beanstalk

Used to deploy applications on AWS infrastructure
Platform as a Service (PaaS)
Automatically handles capacity provisioning, load balancing, scaling, application health monitoring, instance configuration, etc. but we have full control over the configuration
Free (pay for the underlying resources)
Supports versioning of application code
Can create multiple environment (dev, test, prod)
Web & Worker Environments
- Web Environment (Web Server Tier): clients requests are directly handled by EC2 instances through a load balancer.
- Worker Environment (Worker Tier): clients’s requests are put in a SQS queue and the EC2 instances will pull the messages to process them. Scaling depends on the number of SQS messages in the queue.

Elastic Container Service

AWS managed container orchestration platform
Launch Docker containers on AWS = Launch ECS Tasks on ECS Clusters
EFS is used as persistent multi-AZ shared storage for ECS tasks

ECS Componenets

Clusters
- An Amazon ECS cluster is a logical grouping of Tasks or services.
- You can use clusters to isolate your applications, This way, they don't use the same underlying infrastructure
- When your tasks are run on Fargate, your cluster resources are also managed by Fargate
Task definitions
- A task definition is a text file that describes one or more containers that form your application
- It's in JSON format
- You can use it to describe up to a maximum of ten containers
- The task definition functions as a blueprint for your application
- AWS recommend spanning your application across multiple task definitions
- parameters
  - Docker image
  - CPU and memory
  - The command that the container runs when it's started
  - Data volumes that are used with the containers in the task
  - The IAM role that your tasks use
Services
- You can use an Amazon ECS service to run and maintain your desired number of tasks simultaneously in an Amazon ECS cluster
- If any of your tasks fail or stop for any reason, the Amazon ECS service scheduler launches another instance based on your task definition.
- Parameters
  - Cluster
  - Task definition
  - Capacity provider
  - Client token

Launch Types

EC2 Launch Type

Not Serverless
you must provision & maintain the infrastructure (the EC2 instances)
EC2 instances have ECS agent to register in the ECS Cluster
AWS takes care of starting / stopping containers
Use case: Long running process, cost optimisation(possible to reserve EC2 or Spot)

Fargate Launch Type

Serverless
You do not provision the infrastructure (no EC2 instances to manage)
You just create task definitions
AWS just runs ECS Tasks for you based on the CPU / RAM you need
To scale, just increase the number of tasks
Use case: When you want to run container for a little bit of time

IAM Roles for ECS

EC2 Instance Profile (EC2 Launch Type only):
- Used by the ECS agent
- Makes API calls to ECS service
- Pull Docker image from ECR
- Reference sensitive data in Secrets Manager or SSM Parameter Store
ECS Task Role(Both EC2 launch type and Fargate):
- Allows ECS tasks to access AWS resources
- Each task can have a separate role
- Use different roles for the different ECS Services
- Task Role is defined in the task definition

Data Volumes (EFS)

Mount EFS file systems onto ECS tasks
Works for both EC2 and Fargate launch types
Tasks running in any AZ will share the same data in the EFS file system
Fargate + EFS = Serverless

ECS Service Auto Scaling

Automatically increase/decrease the desired number of ECS tasks
Amazon ECS Auto Scaling uses AWS Application Auto Scaling
- Metric :
  - ECS Service Average CPU Utilization
  - ECS Service Average Memory Utilization - Scale on RAM
  - ALB Request Count Per Target – metric coming from the ALB
Scaling type:
- Target Tracking – scale based on target value for a specific CloudWatch metric
- Step Scaling – scale based on a specified CloudWatch Alarm
- Scheduled Scaling – scale based on a specified date/time (predictable changes)
ECS Service Auto Scaling (task level) ≠ EC2 Auto Scaling (EC2 instance level)
Fargate Auto Scaling is much easier to setup (because Serverless)

EC2 Launch Type – Auto Scaling EC2 Instances

Accommodate ECS Service Scaling by adding underlying EC2 Instances
2 type:
- Auto Scaling Group Scaling
- Scale your ASG based on CPU Utilization
- Add EC2 instances over time
- ECS Cluster Capacity Provider(new and more advance)
  - Used to automatically provision and scale the infrastructure for your ECS Tasks
  - Capacity Provider paired with an Auto Scaling Group
  - Add EC2 Instances when you’re missing capacity (CPU, RAM…)

Elastic Container Registry

Store and manage Docker images on AWS
Private and Public repository (Amazon ECR Public Gallery)
Fully integrated with ECS, backed by Amazon S3
Access is controlled through IAM policy
Lifecycle Rule to expire and remove unsed or older images
Caching public repos privately(ECR periodically reaches out to check current caching status)
Tag Mutability Prevent image tags from being overwritten

Elastic Kubernates Service

Used to launch Kubernetes (open-source) clusters on AWS
Supports both EC2 and Fargate launch types
Inside the EKS cluster, we have EKS nodes (EC2 instances) and EKS pods (tasks) within them. We can use a private or public load balancer to access these EKS pods.
EKS is an alternative to ECS
Node Types
- Managed Node Groups
  - Creates and manages Nodes (EC2 instances) for you
  - Nodes are part of an ASG managed by EKS
- Self-Managed Nodes
  - Nodes created by you and registered to the EKS cluster and managed by an ASG
  - You can use prebuilt AMI - Amazon EKS Optimized AMI
- AWS Fargate
  - No maintenance required; no nodes managed

EKS Anywhere

on-premises way to manage Kubernetes (K8s) clusters with the same practices used for Amazon EKS
The key difference is you run thes clusters on premises
Based on EKS Distro
Offers Operates of AWS full lifecycle management of multiple K8s clusters
Operates independently of AWS
Control Plane K8s control plane management is operated completly by the custumer
Location K8s control plane location entirely within a is within customer center or operations center

ECS Anywhere

Feature of Amazon ECS allowing the management of container- based apps on-premises
No need to install and operate local container orchestration software, meaning more operational efficiency

High availability and scalability

Vertical Scaling: Increase instance size (= scale up / down)
- From: t2.nano - 0.5G of RAM, 1 vCPU
- To: u-12tb1.metal – 12.3 TB of RAM, 448 vCPUs
- Hardware limit
- Use case : Non distribuate system like database
Horizontal Scaling: Increase number of instances (= scale out / in)
- Auto Scaling Group
- Load Balancer
High Availability
- Run instances for the same application across multi AZ
- Auto Scaling Group multi AZ
- Load Balancer multi AZ

Elastic Load Balancer

Spread load across multiple EC2 instances
Supports Multi AZ
Expose a single point of access (DNS) to your application
Do regular health checks to your instances
Enforce stickiness with cookies
High availability across zones
Separate public traffic from private traffic

Types

Classic Load Balancer (CLB) - deprecated
- Load Balancing to a single application
- Supports HTTP, HTTPS (layer 7) & TCP (layer 4), SSL
- Health checks are HTTP or TCP based
- Provides a fixed hostname (xxx.region.elb.amazonaws.com)
Application Load Balancer (ALB)
- Load balancing to multiple applications (target groups) based on the request parameters
- Operates at Layer 7 (HTTP, HTTPS and WebSocket)
- Provides a fixed hostname (xxx.region.elb.amazonaws.com)
- Security Groups can be attached to ALBs to filters requests
- Great for micro services & container-based applications (Docker & ECS)
- Client info is passed in the request headers
  - Client IP => X-Forwarded-For
  - Client Port => X-Forwarded-Port
  - Protocol => X-Forwarded-Proto
- Target Groups
  - Health checks are done at the target group level
  - Target Groups could be
    - EC2 instances - HTTP
    - ECS tasks - HTTP
    - Lambda functions - HTTP request is translated into a JSON event
    - Private IP Addresses
- Listener Rules can be configured to route traffic to different target groups based on
  - Path (example.com/users & example.com/posts)
  - Hostname (one.example.com & other.example.com)
  - Query String (example.com/users?id=123&order=false)
  - Request Headers
  - Source IP address
Network Load Balancer (NLB)
- Operates at Layer 4 (TCP, UDP, TLS over TCP)
- Can handle millions of request per seconds (extreme performance)
- Lower latency ~ 100 ms (vs 400 ms for ALB
- 1 static public IP per AZ
- Health Checks support the TCP, HTTP and HTTPS Protocols
- No security groups can be attached to NLBs. Since they operate on layer 4, they cannot see the data available at layer 7. They just forward the incoming traffic to the right target group as if those requests were directly coming from client. So, the attached instances must allow TCP traffic on port 80 from anywhere.
- Within a target group, NLB can send traffic to
  - EC2 instances
  - IP addresses( must be private IPs)
  - Application Load Balancer (ALB)
Gateway Load Balancer (GWLB)
- Operates at layer 3 (Network layer) - IP packets
- Used when you want to inspect, analyze the traffic at network level before coming to your ELB or EC2 etc
- Used to route requests to a fleet of 3rd party virtual appliances like Firewalls, Intrusion Detection and Prevention Systems (IDPS), etc.
- Then after inspection by the 3rd Party route back the traffic to your instances or ELB
- target :
  - EC2 instances
  - IP adresses(must be private)
Sticky Sessions (Session Affinity)
- Requests coming from a client is always redirected to the same instance based on a cookie After the cookie expires, the requests coming from the same user might be redirected to another instance
- Only supported by CLB & ALB because the cookie can be seen at layer 7
- Used to ensure the user doesn’t lose his session data, like login or cart info, while navigating between web pages.
- Stickiness may cause load imbalance
- Cookies could be:
  - Application-based (TTL defined by the application)
  - Load Balancer generated (TTL defined by the load balancer)
- ELB reserved cookie names (should not be used):
  - AWSALB
  - AWSALBAPP
  - AWSALBTG
Cross-zone Load Balancing
Allows ELBs in different AZ containing unbalanced number of instances to distribute the traffic evenly across all instances in all the AZ registered under a load balancer.
Supported Load Balancers
- Classic Load Balancer : Disabled by default
- Application Load Balancer : Always on (can be disabled at the target group level)
- Network Load Balancer : Disabled by default
Security
The load balancer uses an X.509 certificate (SSL/TLS server certificate)
You can manage certificates using ACM (AWS Certificate Manager)
You can create upload your own certificates alternatively
Server Name Indication (SNI)
- SNI solves the problem of loading multiple SSL certificates onto one web server (to serve multiple websites)
- It’s a “newer” protocol, and requires the client to indicate the hostname of the target server in the initial SSL handshake
- The server will then find the correct certificate, or return the default one
- Does not work for CLB work with ALB and NLB
Connection Draining
Connection Draining – for CLB
Deregistration Delay – for ALB & NLB
Time to complete in-flight requests while the instance is de-registering or unhealthy
Stops sending new requests to the EC2 instance which is de-registering
Between 1 to 3600 seconds (default: 300 seconds)

Auto Scaling Group

The goal of an Auto Scaling Group (ASG) is to:

Scale out (add EC2 instances) to match an increased load
Scale in (remove EC2 instances) to match a decreased load
Ensure we have a minimum and a maximum number of EC2 instances running
Automatically register new instances to a load balancer
Re-create an EC2 instance in case a previous one is terminated (ex: if unhealthy)
ASG can terminate instances marked as unhealthy by an ELB

Scaling Policies

Scheduled Scaling
- Scale based on a schedule
- Used when the load pattern is predictable
- Anticipate a scaling based on known usage patterns
Simple Scaling/Step Scaling
- Scale to certain size on a CloudWatch alarm (ex average CPU utilization in all ASG instances)
- When a CloudWatch alarm is triggered (example CPU > 70%), then add 2 units
- When a CloudWatch alarm is triggered (example CPU < 30%), then remove 1
Target Tracking Scaling
- ASG maintains a CloudWatch metric and scale accordingly to maintain the target defined
- Ex. maintain CPU usage at 40%
Predictive Scaling
- Historical data is used to predict the load pattern using ML and scale automatically

Launch Configuration & Launch Template

Defines the following info for ASG
- AMI (Instance Type)
- EC2 User Data
- EBS Volumes
- Security Groups
- SSH Key Pair
- Min / Max / Desired Capacity
- Subnets (where the instances will be created)
- Load Balancer (specify which ELB to attach instances)
- Scaling Policy
Launch Configuration (legacy)
- Cannot be updated (must be re-created)
- Does not support Spot Instances
Launch Template (newer)
- Versioned
- Can be updated
- Supports both On-Demand and Spot Instances
- Recommended by AWS

Cooldown

After a scaling activity happens, the ASG goes into cooldown period (default 300 seconds) during which it does not launch or terminate additional instances (ignores scaling requests) to allow the metrics to stabilize.
Use a ready-to-use AMI to launch instances faster to be able to reduce the cooldown period

Warm-Up

Warm-up value for Instances allows you to control the time until a newly launched instance can contribute to the CloudWatch metrics, so when warm-up time has expired, an instance is considered to be a part Auto Scaling group and will receive traffic

Relational Database scaling (RDS)

There are 4 types of scaling we can use to adjust our relational database performance:

Aurora Serverless
- We can offload scaling to AWS. Excels with unpredictable workloads.
Read Replicas
- Creating read-only copies of our data can help spread out the workload
Scaling Storage
- Storage can be resized(disk size), but it’s only able to go up, not down(except for Aurora).
Vertical Scaling
- Resizing the database from one size(ex EC2 t2micro) to another(ex EC2 t3 large) can create greater performance.

No Relational Database scaling

DynamoDB DynamoDB scaling is used to scale dynamoDB

Storage

S3

Remember that S3 is Object-based: i.e allows you to upload files.
Files can be 0 Bytes to 5 TB.
There is unlimited storage
Files are stored in Buckets. Bucket is tied to region, while S3 is global level
Not suitable to install operating systems on S3 due to it being object based
You can turn on MFA delete to avoid accidental delete.
Tiered Storage S3 offers a range of storage classes designed for different use cases.
Lifecycle Management Define rules to automatically transition objects to a cheaper storage tier or delete objects that are no longer required after a set period of time.

Versioning

You can version your files in Amazon S3
Protect against unintended deletes (ability to restore a version)
Easy roll back to previous version
With versioning, all versions of an object are stored and can be retrieved, including deleted objects.
When you DELETE an object, all versions remain in the bucket and Amazon S3 inserts a delete marker.
You can permanently delete an object by specifying the version you want to delete. Only the owner of an Amazon S3 bucket can permanently delete a version.
Versioning can only be suspended once it has been enabled.

Amazon S3 – Replication

Must enable Versioning in source and destination buckets
Cross-Region Replication (CRR)
Same-Region Replication (SRR)
Buckets can be in different AWS accounts
After you enable Replication, only new objects are replicated
You can replicate existing objects using S3 Batch Replication
For DELETE operations:
- Replicate delete markers from source to target (optional)
- Permanent deletes are not replicated

Security

User based security
- IAM policies define which API calls should be allowed for a specific user
- Preferred over bucket policy for fine-grained access control
Resource Based Policies
- Bucket Policies
  - Grant public access to the bucket
  - Can either add or deny permissions across all (or a subset) of objects within a bucket.
  - Force objects to be encrypted at upload
  - You use a bucket policy to control access to objects in the bucket that are owned by the account used to create the bucket. Y
  - Cross-account access
- Access Control Lists
  - A list of grants identifying grantee and permission granted
  - ACLs use an S3–specific XML schema.
  - You can grant permissions only to other AWS accounts, not to users in your account
  - You need to use an ACL to control access to objects in your bucket but owned by other account
  - You cannot grant conditional permissions, nor explicitly deny permissions.
  - Object ACLs are limited to 100 granted permissions per ACL
  - The only recommended use case for the bucket ACL is to grant write permissions to the S3 Log Delivery group
Note: An IAM principal can access an S3 object if the IAM permission allows it or the bucket policy allows it and there is no explicit deny.
Important
- When you configure a bucket as a static website, if you want your website to be public, you can grant public read access. To make your bucket publicly readable, you must disable block public access settings for the bucket and write a bucket policy that grants public read access. If your bucket contains objects that are not owned by the bucket owner, you might also need to add an object access control list (ACL) that grants everyone read access.
- If you don't want to disable block public access settings for your bucket but you still want your website to be public, you can create a Amazon CloudFront distribution to serve your static website
- You can use a bucket policy to grant public read permission to your objects. However, the bucket policy applies only to objects that are owned by the bucket owner. If your bucket contains objects that aren't owned by the bucket owner, the bucket owner should use the object access control list (ACL) to grant public READ permission on those objects.

Accessing a bucket

Virtual-hosted–style access
- https://bucket-name.s3.region-code.amazonaws.com/key-name
- https://DOC-EXAMPLE-BUCKET1.s3.us-west-2.amazonaws.com/puppy.png
Path-style access
- https://s3.region-code.amazonaws.com/bucket-name/key-name
- https://s3.us-west-2.amazonaws.com/DOC-EXAMPLE-BUCKET1/puppy.jpg

You can encrypt objects in S3 buckets using one of 4 methods

Encryption in Transit
- SSL/TLS
- HTTPS
- HTTPS is mandatory for SSE-C
Server-Side Encryption (SSE)
- Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) – Enabled by Default
  - Encrypts S3 objects using keys handled, managed, and owned by AWS
  - Object is encrypted server-side using AES-256
  - Must set header: "x-amz-server-side-encryption": "AES256"
- Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)
  - Leverage AWS Key Management Service (AWS KMS) to manage encryption keys
  - If you use SSE-KMS, you may be impacted by the KMS limits quotas
  - Must set header: "x-amz-server-side-encryption": "aws:kms"
- Server-Side Encryption with Customer-Provided Keys (SSE-C)
  - When you want to manage your own encryption keys
  - Amazon S3 does NOT store the encryption key you provide
  - HTTPS must be used
  - Encryption key must provided in HTTP headers, for every HTTP request made
- Client-Side Encryption
  - Use client libraries such as Amazon S3 Client-Side Encryption Library
  - Clients must encrypt data themselves before sending to Amazon S3
  - Clients must decrypt data themselves when retrieving from Amazon S3

Enforcing Encryption with a Bucket Policy

A bucket policy can deny all PUT requests that don’t include the x-amz-server-side encryption parameter in the request header

CORS

If a client makes a cross-origin request on our S3 bucket, we need to enable the correct CORS headers
You can allow for a specific origin or for *

MFA Delete

MFA will be required to:
- Permanently delete an object version
- Suspend Versioning on the bucket
Bucket Versioning must be enabled
Can only be enabled or disabled by the root user

S3 Access Logs

For audit purpose, you may want to log all access to S3 buckets
Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
That data can be analyzed using data analysis tools
The target logging bucket must be in the same AWS region

Pre-signed URL

Pre-signed URLs for S3 have temporary access token as query string parameters which allow anyone with the URL to temporarily access the resource before the URL expires (default 1h)
Pre-signed URLs inherit the permission of the user who generated it
Uses:
- Allow only logged-in users to download a premium video
- Allow users to upload files to a precise location in the bucket

S3 Object Lock and Glacier Vault Lock

Use S3 Object Lock to store objects using a write once, read many (WORM)model.
Object Lock comes in two modes: governance mode and compliance mode.
- Governance mode: users can’t overwrite or delete an object version or alter
  its lock settings unless they have special permissions
- Compliance mode: a protected object version can’t be overwritten or deleted
  by any user, including the root user in your AWS account

S3 Storage Classes

S3 Standard – General Purpose

Used for frequently accessed data
Low latency and high throughput
The default storage class
Minimum storage N/A but for Transition from S3 Standard or S3 Standard-IA to S3 Standard-IA or S3 One Zone-IA there is a minimum 30 days duration
This limitation does not apply to INTELLIGENT_TIERING, GLACIER, and DEEP_ARCHIVE storage class.
Use cases include websites, content distribution, mobile and gaming applications, and big data analytics

S3 Standard-Infrequent Access (S3 Standard-IA)

Infrequently accessed data(once a month)
Rapid Access: Used for data that is accessed less frequently but requires rapid access when needed.
Minimum storage duration of 30 days
There is a low per-GB storage price and a per-GB retrieval fee
Use cases: Disaster Recovery, backups

S3 Intelligent-Tiering

Data with changing or unknown access patterns
Automatically moves your data to the most cost-effective tier based on how frequently you access each object.
Minimum storage duration of 30 days

S3 Glacier

Low-cost object storage meant for archiving / backup
Pricing: price for storage + object retrieval cost
Amazon S3 Glacier Instant Retrieval
- Millisecond retrieval, great for data accessed once a quarter
- Minimum storage duration of 90 days
Amazon S3 Glacier Flexible Retrieval
- Data accessed once a year
- 3 retrieval flexibility:
  - Expedited (1 to 5 minutes)
  - Standard (3 to 5 hours)
  - Bulk (5 to 12 hours)
- Minimum storage duration of 90 days
Amazon S3 Glacier Deep Archive – for long term storage
- Data accessed once a year
- 2 flexible retrieval:
  - Standard (12 hours)
  - Bulk (48 hours)
- Minimum storage duration of 180 days

S3 Lifecycle Management

Lifecycle management automates moving your objects between thedifferent storage tiers, thereby maximizing cost effectiveness.

S3 Notification Events

Optional
Generates events for operations performed on the bucket or objects
Targets:
- SNS topics
- SQS Standard queues (not FIFO queues)
- Lambda functions

S3 performance

3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second, per prefix General Perf

You can get better performance by spreading your reads across different prefixes, you can achieve 11,000 requests per second with 2 prefixes.
If we used all 4 prefixes in the last example, you would achieve 22,000 requests per second
There are no limits to the number of prefixes in a bucket.

Multipart Uploads Upload Perf

Recommended for files over 100 MB
Required for files over 5 GB
Parallelize uploads (increases efficiency)

S3 Transfer Acceleration

Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
Compatible with multi-part upload
Data is ingested at the nearest edge location and is transferred over AWS private network (uses CloudFront internally)

S3 Byte-Range Fetches Downloads Perf

Parallelize downloads by specifying byte ranges.
Better resilience in case of failures, since we only need to refetch the failed byte range and not the whole file

S3 Select & Glacier Select

Retrieve less data using SQL by performing server-side filtering
Can filter by rows & columns (simple SQL statements)
Less network transfer cost
Less CPU cost client-side

EBS

An EBS (Elastic Block Store) Volume is a network drive you can attach to your instances while they run
Can only be mounted to 1 instance at a time (except EBS multi-attach)
It allows your instances to persist data, even after their termination
They are bound to a specific availability zone
An EBS Volume in us-east-1a cannot be attached to us-east-1b
To move a volume across, you first need to snapshot
EBS Multi-attach allows the same EBS volume to attach to multiple EC2 instances in the same AZ

EBS Snapshots

Make a backup (snapshot) of your EBS volume at a point in time
Not necessary to detach volume to do snapshot, but recommended
Can copy snapshots across AZ or Region

EBS Snapshot Archive

Move a Snapshot to an ”archive tier” that is 75% cheaper
Takes within 24 to 72 hours for restoring the archive

Recycle Bin for EBS Snapshots

Setup rules to retain deleted snapshots so you can recover them after an accidental deletion
Specify retention (from 1 day to 1 year)

Fast Snapshot Restore (FSR)

Force full initialization of snapshot to have no latency on the first use ($$$)

Volume Types Only gp2/gp3 and io1/io2 can be used as boot volumes

EBS SSD

gp2/gp3 (SSD) General purpose SSD volume that balances price and performance for a wide variety of workloads
io1/io2 (SSD) Provisioned IOPS Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads
Attach the same EBS volume to multiple EC2 instances in the same AZ

EBS HDD

st1 (HDD) Low cost HDD volume designed for frequently accessed, throughput- intensive workloads
sc1 (HDD) Lowest cost HDD volume designed for less frequently accessed workloads

Tips

Amazon EBS provides three volume types to best meet the needs of your workloads: General Purpose (SSD), Provisioned IOPS (SSD), and Magnetic.
General Purpose (SSD) is the new, SSD-backed, general purpose EBS volume type that is recommended as the default choice for customers. General Purpose (SSD) volumes are suitable for a broad range of workloads, including small to medium-sized databases, development and test environments, and boot volumes.
Provisioned IOPS (SSD) volumes offer storage with consistent and low-latency performance and are designed for I/O intensive applications such as large relational or NoSQL databases. Magnetic volumes provide the lowest cost per gigabyte of all EBS volume types.
Magnetic volumes are ideal for workloads where data are accessed infrequently, and applications where the lowest storage cost is important. Take note that this is a Previous Generation Volume. The latest low-cost magnetic storage types are Cold HDD (sc1) and Throughput Optimized HDD (st1) volumes.

EBS Encryption

For Encrypted EBS volumes:
- Data at rest is encrypted
- EBS Encryption leverages keys from KMS (AES-256)
- Data in-flight between the instance and the volume is encrypted
- All snapshots are encrypted
- All volumes created from the snapshot are encrypted
Encrypt an un-encrypted EBS volume
- Create an EBS snapshot of the volume
- Copy the EBS snapshot and encrypt the new copy
- Create a new EBS volume from the encrypted snapshot (the volume will be automatically encrypted)

EFS

Managed NFS (network file system) that can be mounted on many EC2
EFS works with EC2 instances in multi-AZ
Highly available, scalable, expensive
File system scales automatically, pay-per-use, no capacity planning
Compatible with Linux-based AMI (Windows not supported at this time)
Encryption at rest using KMS

Performance Mode

File system performance is typically measured by using the dimensions of latency, throughput, and Input/Output operations per second (IOPS)
When creating an EFS file system, you can set what performance characteristics you want
- General Purpose (default):
  - Has the lowest per-operation latency
  - Use cases (web server, CMS, etc.)
- Max I/O :
  - Max I/O mode is designed for highly parallelized workloads that can tolerate higher latencies than the General Purpose mode
  - higher latency & throughput (big data, media processing)

Throughput Mode

Bursting (default)
- Throughput: 50MB/s per TB
- Burst of up to 100MB/s.
- Bursting Throughput mode is recommended for workloads that require throughput that scales with the amount of storage in your file system.
Provisioned
- Fixed throughput (provisioned)
- In Provisioned Throughput mode, you specify a level of throughput that the file system can drive independent of the file system's size

Storage Tiers

EFS comes with storage tiers and lifecycle management, allowing you to move your data from one tier to another after X number of days.
- Standard For frequently accessed files
- Infrequently Accessed For files not frequently accessed

Encryption

EFS supports two forms of encryption for file systems
Encryption of data in transit
- You can enable encryption of data in transit when you mount the file system using the Amazon EFS mount helper
- Data is encrypted in transit without needing to modify your applications.
Encryption at rest
- You can enable encryption of data at rest when creating an Amazon EFS file
- You can create encrypted file systems using:
  - AWS Management Console
  - AWS CLI
  - SDK

Instance Store

EBS volumes are network drives with good but “limited” performance
If you need a high-performance hardware disk, use EC2 Instance Store
Better I/O performance
You can specify the instance store volumes for your instance only when you launch it. You can't attach instance store volumes to an instance after you've launched it.
EC2 Instance Store lose their storage if they’re stopped (ephemeral)
Instance store persists during reboots, not during the stop and start of the instance.
Good for buffer / cache / scratch data / temporary content

FSX

Allows us to launch 3rd party high-performance file systems on AWS
Useful when we don’t want to use an AWS managed file system like S3
Can be accessed from your on-premise infrastructure

FSx for Windows

A managed Windows Server that runs Windows Server Message Block (SMB) -based file services.
Designed for Windows and Windows applications.
Supports Multi-AZ (high availability)
Supports AD users, access control lists, groups, and security policies, along with Distributed File System (DFS) namespaces and replication.

Amazon FSx for Lustre

A fully managed file system that is optimized for compute-intensive workloads
High Performance Computing(HPC)
Scales up to 100s GB/s, millions of IOPS, sub-ms latencies
Only works with Linux
Machine Learning
Media Data Processing Workflows

FSx Deployment Options

Scratch File System
- Temporary storage (cheaper)
- Data is not replicated (data lost if the file server fails)
- High burst (6x faster than persistent file system)
- Usage: short-term processing
Persistent File System
- Long-term storage (expensive)
- Data is replicated within same AZ
- Failed files are replaced within minutes
- Usage: long-term processing, sensitive data

How differ EFS, FSx for Windows, or FSx for Lustre

EFS: When you need distributed, highly resilient storage for Linux instances and Linux-based applications.
Amazon FSx for Windows: When you need centralized storage for Windows-based applications, such as SharePoint\ Microsoft SQL Server, Workspaces, IIS Web Server, or any other native Microsoft application.
Amazon FSx for Lustre When you need high-speed, high-capacity distributed storage.
This will be for applications that do high performance computing (HPC), financial modeling, etc.
Remember that FSx for Lustre can store data directly on S3

Storage Gateway

Bridge between on-premises data and cloud data
Not suitable for one-time sync of large amounts of data (use DataSync instead)
Optimizes data transfer by sending only changed data
Use cases:
- disaster recovery
- backup & restore
- on-premises cache & low-latency files access

Types of Storage Gateway

S3 File Gateway

Configured S3 buckets are accessible using the NFS and SMB protocol
Most recently used data is cached in the file gateway__
Supports S3 Standard, S3 Standard IA, S3 One Zone A, S3 Intelligent Tiering
Transition to S3 Glacier using a Lifecycle Policy
Bucket access using IAM roles for each File Gateway
SMB Protocol has integration with Active Directory (AD) for user authentication

FSx File Gateway

Native access to Amazon FSx for Windows File Server
Local cache for frequently accessed data
Windows native compatibility (SMB, NTFS, Active Directory...)
Useful for group file shares and home directories

Volume Gateway

Block storage using iSCSI(Internet Small Computer System Interface ) protocol backed by S3
Backed by EBS snapshots which can help restore on-premises volumes
Two kinds of volumes:
- Cached volumes:
  - You store your data in S3 and retain a copy of frequently accessed data subsets locally
  - Low latency access to most recent data
- Stored volumes:
  - you store the entire set of volume data on premise and store periodic point in time backup(snapshots) in S3
  - Low-latency access to your entire dataset

Tape Gateway

Used to backup on-premises data using tape-based process to S3 as Virtual Tapes
Uses iSCSI protocol

Storage Gateway - Hardware Appliance

Storage Gateway requires on-premises virtualization. If you don’t have virtualization available, you can use a Storage Gateway - Hardware Appliance. It is a mini server that you need to install on-premises.
Does not work with FSx File Gatway

Aws Backup

Backup allows you to consolidate your backups across multiple AWS services, such as :

EC2
EBS
EFS
S3
Amazon FSx for Lustre
Amazon FSx for Windows File Server
AWS Storage Gateway
RDS
DynamoDB

It gives you centralized control across all AWS services, in multiple AWS accounts across the entire AWS organization.

Aws backup benefit

Central Management
Automation : create automated backup schedules and retention policies,create lifecycle policies
allowing you to expire unnecessary backups after a period of time
Improved Compliance: Backup policies can be enforced while backups can be encrypted both at rest and in transit
allowing alignment to regulatory compliance

Backup Vault

WORM (Write Once Read Many) model for backups
Even the root user cannot delete backups
Additional layer of defense to protect your backups against:
- Inadvertent or malicious delete operations
- Updates that shorten or alter retention periods

Database

RDS

RDS stands for Relational Database Service
It’s a managed DB service for DB use SQL as a query language
RDS is generally used for online transaction processing (OLTP) workloads
Databases supported :
- Postgres
- MySql
- MariaDB
- Oracle
- Microsoft SQL Server
- Aurora
Continuous backups and restore to specific timestamp (Point in Time Restore)!
You can’t SSH into your instances

RDS Auto Scaling

When RDS detects you are running out of free database storage, it scales automatically.
You have to set Maximum Storage Threshold (maximum limit for DB storage)
Condition for automatic storage scaling:
- Free storage is less than 10% of allocated storage
- Low-storage lasts at least 5 minutes
- 6 hours have passed since last modification

RDS Read Replicas for read scalability AKA Performance

A read-only copy of your primary database in the same AZ, cross-AZ, or cross-region
Used to increase or scale read performance.
Up to 5 Read Replicas
Replication is ASYNC so reads are eventually consistent
Replicas can be promoted to their own DB
Applications must update the connection string to leverage read replicas

RDS Multi AZ for Disaster Recovery

With Multi-AZ, RDS creates an exact copy of your production database in another Availability Zone
Synchronous replication
One DNS name, so connection string does not require to be updated(both the databases can be accessed by one DNS name
which allows for automatic DNS failover to standby database)
When failing over, RDS flips the CNAME(map hostname to another hostname, so map dns name to to standby dns name) record for the DB instance to point at the standby, which is in turn promoted to become the new primary.
Cannot be used for scaling as the standby database cannot take read/write operation
The Read Replicas can be setup as Multi AZ for Disaster Recovery(DR)

RDS From Single-AZ to Multi-AZ

Zero downtime operation (no need to stop the DB)
Just click on “modify” for the database
The following happens internally
- A snapshot is taken
- A new DB is restored from the snapshot in a new AZ
- Synchronization is established between the two databases

RDS Backup

Automated Backups (enabled by default)
- Daily full backup of the database (during the defined maintenance window)
- Backup retention: 7 days (max 35 days)
- Transaction logs are backed-up by RDS every 5 minutes (point in time recovery)
DB Snapshots
- Manually triggered
- Backup retention: unlimited
- in a stopped RDS database, you will still pay for storage. If you plan on stopping it for a long time, you should snapshot & restore instead

RDS Proxy

Fully managed database proxy for RDS
Allows apps to pool and share DB connections established with the database
improving database efficiency by reducing the stress on database resources (e.g., CPU, RAM) and minimize open connections (and timeouts)
Serverless, autoscaling, highly available (multi-AZ)
Reduced RDS & Aurora failover time by up 66%
Enforce IAM Authentication for DB, and securely store credentials in AWS Secrets Manager
RDS Proxy is never publicly accessible (must be accessed from VPC)

RDS Custom

Managed Oracle and Microsoft SQL Server Database with OS and database customization
RDS: Automates setup, operation, and scaling of database in AWS
Custom: access to the underlying database and OS so you can
- Configure settings
- Install patches
- Enable native features
- Access the underlying EC2 Instance using SSH or SSM Session Manager
De-activate Automation Mode to perform your customization
RDS vs. RDS Custom
- RDS: entire database and the OS to be managed by AWS
- RDS Custom: full admin access to the underlying OS and the database

Amazon Aurora

Aurora is a proprietary technology from AWS (not open sourced)
Postgres and MySQL are both supported as Aurora DB
Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
Aurora storage automatically grows in increments of 10GB, up to 128 TB.
Up to 15 read replicas
Supports only MySQL & PostgreSQL
Failover in Aurora is instantaneous.
Backtrack: restore data at any point of time without using backups

Aurora High Availability

2 copies of your data are contained in each Availability Zone
with a minimum of 3 Availability Zones
6 copies of your data
Aurora is designed to transparently handle the loss of up to 2 copies of data without affecting
database write availability and up to 3 copies without affecting read availability
Aurora storage is also self-healing.
Data blocks and disks are continuously scanned for errors and repaired automatically
Support for Cross Region Replication
Automated failover A read replica is promoted as the new master in less than 30 seconds
In case no replica is available, Aurora will attempt to create a new DB Instance in the same AZ as the original instance

Aurora Read Scaling

One Aurora Instance takes writes (master)
Master + up to 15 Aurora Read Replicas serve reads
Aurora DB Cluster :
- Writer Endpoint :
  - Always points to the master (can be used for read/write)
  - Each Aurora DB cluster has one writer cluster endpoint
- Reader Endpoint
  - Provides load-balancing for read replicas only (used to read only)
  - If the cluster has no read replica, it points to master (can be used to read/write)
  - Each Aurora DB cluster has one reader endpoint
  - When the client want to read the readerEndpoint will load balacing to a Read Replica
- Custom Endpoint
  - Used to point to a subset of replicas
  - Provides load-balanced based on criteria other than the read-only or read-write capability of the DB instances like instance class (ex, direct internal users to low-capacity instances and direct production traffic to high-capacity instances)
  - When a custom endpoint is set a Reader Endpoint will not be used

Aurora Serverless

Optional
Automated database instantiation and auto scaling based on actual usage
Good for infrequent,intermittent or unpredictable workloads
No capacity planning needed
Pay per second, can be more cost effective

Aurora Multi-Master

Optional
In case you want immediate failover for write node (High availability)
Every node does R/W - vs promoting a Read Replica as the new master

Aurora Global Database

Aurora Cross Region Read Replicas:
- Useful for disaster recovery
Designed for globally distributed applications with low latency local reads in each region
1 Primary Region (read / write)
Up to 5 secondary (read-only) regions (replication lag < 1 second)
Up to 16 Read Replicas per secondary region
Helps for decreasing latency for clients in other geographical locations
RTO of less than 1 minute (to promote another region as primary)

Aurora Backup

Automated backups
- 1 to 35 days (cannot be disabled)
- point-in-time recovery in that timeframe
Manual DB Snapshots
- Manually triggered by the user
- Retention of backup for as long as you want
- Aurora clone is faster to create a new DB than form Snapshot

ElastiCache

The same way RDS is to get managed Relational Databases…
ElastiCache is to get managed Redis or Memcached
Caches are in-memory databases with really high performance, low latency
Helps make your application stateless, because it doesn’t have to cache locally

DynamoDB

Fully managed, highly available with replication across multiple AZs
NoSQL database - not a relational database - with transaction support
Scales to massive workloads, distributed database
Single digit millisecond response time at any scale
Maximum size of an item is 400KB
Supports TTL (automatically delete an item after an expiry timestamp)
Supports Transactions (either write to multiple tables or write to none)- DynamoDB transactions.
DynamoDB transactions provide developers atomicity, consistency, isolation, and durability
(ACID across 1 or more tables within a single AWS account and region.
All-or-nothing transactions.

Capacity

Provisioned Mode (default)
- You specify the number of reads/writes per second
- You need to plan capacity beforehand
- Pay for provisioned Read Capacity Units (RCU) & Write Capacity Units (WCU)
- Auto-scaling option (eg. set RCU and WCU to 80% and the capacities will be scaled automatically based on the workload)
On-demand Mode
- Read/writes automatically scale up/down based on workloads
- No capacity planning needed
- Pay for what you use, more expensive ($$$)
- Great for unpredictable workloads, steep sudden spikes

DynamoDB Accelerator (DAX)

Fully managed, highly available, in-memory cache
10x performance improvement
Reduces request time from milliseconds to microseconds even under load
Help solve read congestion by caching
5 minutes TTL for cache (default)
Doesn’t require application code changes

DynamoDB Streams

Ordered stream of notifications of item-level modifications (create/update/delete) in a table
Destination can be:
- Kinesis Data Streams
- AWS Lambda
- Kinesis Client Library applications
Data Retention for up to 24 hours
Allow Implement cross-region replication
React to changes in real-time (welcome email to users)

DynamoDB Global Tables

Globally distributed applications
Based on DynamoDB streams
Multi-region redundancy for disaster recovery or high availability
Replication latency under 1 second
Must enable DynamoDB Streams as a pre-requisite

DocumentDB

Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
DocumentDB is the same for MongoDB (which is a NoSQL database)
Fully Managed, highly available with replication across 3 AZ
DocumentDB storage automatically grows in increments of 10GB, up to 64 TB.
Automatically scales to workloads with millions of requests per seconds

Amazon Neptune

Fully managed graph database
A popular graph dataset would be a social network
Highly available across 3 AZ, with up to 15 read replicas
Highly available with replications across multiple AZs

Amazon QLDB

QLDB stands for ”Quantum Ledger Database”
A ledger is a book recording financial transactions
Fully Managed, Serverless, High available, Replication across 3 AZ
Used to review history of all the changes made to your application data over time
Immutable system: no entry can be removed or modified, cryptographically verifiable
You cannot update a record (i.e.,replace old content) in a ledger database. Instead, an update adds a new record to the databas
Use case : financial transactions, supply chain, cryptocurrencies, such as Bitcoin, blockchain

Amazon Timestream

Fully managed, fast, scalable, serverless time series database
Automatically scales up/down to adjust capacity
Encryption in transit and at rest
Use cases: IoT apps, operational applications, real time analytics, …

Decoupling applications

Synchronous between applications can be problematic if there are sudden spikes of traffic
What if you need to suddenly encode 1000 videos but usually it’s 10?
In that case, it’s better to decouple your applications:
- using SQS: queue model
- using SNS: pub/sub model
- using Kinesis: real-time streaming model
These services can scale independently from our application

SQS

For Simple Notification Service

Used to asynchronously decouple applications
Supports multiple producers & consumers
The message is persisted in SQS until a consumer deletes it
The consumer polls the queue for messages. Once a consumer processes a message, it deletes it from the queue using DeleteMessage API.
Max message size: 256KB
Default message retention: 4 days (max: 14 days)
Consumers could be EC2 instances or Lambda functions, Kinesis

Queue Types

Standard Queue

Unlimited throughput (publish any number of message per second into the queue)
Low latency (<10 ms on publish and receive)
Can have duplicate messages (at least once delivery)
Can have out of order messages (best effort ordering)

FIFO Queue

Limited throughput: 300 msg/s without batching, 3000 msg/s with
Messages are processed in order by the consumer
Message De-duplication:
- De-duplication interval: 5 min (duplicate messages will be discarded only if they are sent less than 5 mins apart)
- De-duplication methods:
  - Content-based de-duplication: computes the hash of the message body and compares
  - Using a message de-duplication ID: messages with the same de-duplication ID are considered duplicates
Message Grouping
- Group messages based on MessageGroupID to send them to different consumers
- Same value for MessageGroupID
  - All the messages are in order
  - Single consumer
- Different values for MessageGroupID
  - Messages will be ordered for each group ID
  - Ordering across groups is not guaranteed(messages that belong to different message groups)
  - Each group ID can have a different consumer (parallel processing)

Consumer Auto Scaling

We can attach an ASG to the consumer instances which will scale based on the CW metric Queue Length(ApproximateNumberOfMessages) CW alarms can be triggered to step scale the consumer application.

Security

Encryption

In-flight encryption using HTTPS API
At-rest encryption using KMS keys
Client-side encryption if the client wants to perform encryption/decryption itself

Access Controls : IAM policies to regulate access to the SQS API

SQS Access Policies(resource based policy)

Useful for cross-account access to SQS queues
Useful for allowing other services (SNS, S3…) to write to an SQS queue

Configurations

Message Visibility Timeout

After a message is polled by a consumer, it becomes invisible to other consumers
By default, the “message visibility timeout” is 30 seconds
That means the message has 30 seconds to be processed
After the message visibility timeout is over, the message is “visible” in SQS
A consumer could call the ChangeMessageVisibility API to get more time
If visibility timeout is high (hours), and consumer crashes, re-processing will take time
If visibility timeout is too low (seconds), we may get duplicates

Dead Letter Queue (DLQ)

An SQS queue used to store failing to be processed messages in another queue
After the MaximumReceives(the number of times that a message can be received before being sent to a dead-letter queue) threshold is exceeded, the message goes into the DLQ
Redrive to Source - once the bug in the consumer has been resolved, messages in the DLQ can be sent back to the queue (original queue or a custom queue) for processing
Prevents resource wastage
Recommended to set a high retention period for DLQ (14 days)

Queue Delay/Delivery Delay

Delay message delivery
Consumers see the message after some delay
Default: 0 (Max: 15 min)
Can be set at the queue level

Long Polling

When a consumer requests messages from the queue, it can optionally “wait” for messages to arrive if there are none in the queue
This is called Long Polling
Decreases the number of API calls made to SQS (cheaper)
Reduces latency (incoming messages during the polling will be read instantaneously)
Polling time: 1 sec to 20 sec
Long Polling is preferred over Short Polling
Can be enabled at the queue level or at the consumer level by using WaitTimeSeconds parameter in__ ReceiveMessage__ API.

SQS + Lambda + DLQ

Failed messages (after the set number of retries) are sent to the DLQ by the SQS queue

SNS

For SNS for Simple Queue Service

Pub-Sub model (publisher publishes messages to a topic, subscribers listen to the topic)
Instant message delivery (does not queue messages)

Security

Encryption

In-flight encryption using HTTPS API
At-rest encryption using KMS keys
Client-side encryption if the client wants to perform encryption/decryption itself

Access Controls : IAM policies to regulate access to the SNS API

SNS Access Policies(resource based policy)

Useful for cross-account access to SNS queues
Useful for allowing other services (S3…) to write to an SNS queue

Standard Topics

Highest throughput
At least once message delivery
Best effort ordering
Subscribers can be:
- SQS queue
- HTTP / HTTPS endpoints
- Lambda functions
- Emails (using SNS)
- SMS & Mobile Notifications
- Kinesis Data Firehose (KDF) to send the data into S3 or Redshift

FIFO Topics

Guaranteed ordering of messages in that topic
Publishing messages to a FIFO topic requires:
- Ordering by Message Group ID (all messages in the same group are ordered)
- Deduplication using a Deduplication ID or Content Based Deduplication
Can only have SQS FIFO queues as subscribers
Limited throughput (same as SQS FIFO) because only SQS FIFO queues can read from FIFO topics

SNS + SQS Fanout Pattern

Fully decoupled, no data loss
SQS allows for: data persistence, delayed processing and retries of work
Make sure your SQS queue access policy allows for SNS to write

Kinesis

Makes it easy to collect, process, and analyze streaming data in real-time
Ingest real-time data such as: Application logs, Metrics, Website clickstreams, IoT telemetry data…

Kinesis Data Streams

Real-time data streaming service
Used to ingest data in real time directly from source
Retention between 1 day to 365 days
Ability to reprocess (replay) data
Once data is inserted in Kinesis, it can’t be deleted (immutability)
Data that shares the same partition goes to the same shard (ordering)
Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
Consumers:
- Write your own: Kinesis Client Library (KCL), AWS SDK
- Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics,
Capacity Modes
- Provisioned
- You choose the number of shards provisioned, scale manually or using API
- Each shard gets 1MB/s in (or 1000 records per second)
- Each shard gets 2MB/s out (classic or enhanced fan-out consumer)
- You pay per shard provisioned per hour
- On-demand mode
- No need to provision or manage the capacity
- Default capacity provisioned (4 MB/s in or 4000 records per second)
- Scales automatically based on observed throughput peak during the last 30 days
- Pay per stream per hour & data in/out per GB

Kinesis Data Firehose

Fully Managed Service, no administration, automatic scaling, serverless
Used to load streaming data into a target location with optional transformation
Can ingest data in real time directly from source
Destinations:
- AWS: Redshift, S3, OpenSearch
- 3rd party: Splunk, MongoDB, DataDog, NewRelic, etc.
- Custom HTTP endpoint
- Supports custom data transformation using Lambda functions
- No replay capability (does not store data like KDS)

Amazon MQ

If you have some traditional applications running from on-premise, they may use open protocols such as MQTT, AMQP, STOMP, Openwire, WSS, etc. When migrating to the cloud, instead of re-engineering the application to use SQS and SNS (AWS proprietary), we can use Amazon MQ (managed Apache ActiveMQ) for communication.
Doesn’t “scale” as much as SQS or SNS because it is provisioned
Runs on a dedicated machine (can run in HA with failover)
Has both queue feature (SQS) and topic features (SNS)

Event Bridge

Schedule or Cron to create events on a schedule
Event Pattern: Event rules to react to a service doing something
Target: Trigger Lambda functions, send SQS/SNS messages etc

Data & Analytics

Athena

Serverless query service to analyze data stored in Amazon S3
Uses SQL language to query the files
Built on Presto engine
Output stored in S3
Supports CSV, JSON, ORC, Avro, and Parquet file format
Commonly used with Amazon Quicksight for reporting/dashboards

Performance

Use columnar data for cost-savings (less scan)
Compress data for smaller retrievals (bzip2, gzip, lz4, snappy, zlip, zstd…)
Partition datasets in S3 for easy querying on virtual columns

Amazon Athena – Federated Query

Allows you to run SQL queries across data stored in relational, non-relational, object, and custom data sources (AWS or on-premises)
Store the results back in Amazon S3

Redshift

AWS managed data warehouse (10x better performance than other data warehouses)
Based on PostgreSQL
Used for Online Analytical Processing (OLAP) and high performance querying
Columnar storage of data with massively parallel query execution in SQL
Faster querying than Athena due to indexes
Need to provision instances as a part of the Redshift cluster (pay for the instances provisioned)
Integrated with Business Intelligence (BI) tools such as QuickSight or Tableau
Redshift Cluster can have 1 to 128 nodes (128TB per node)
- Leader Node: query planning & result aggregation
- Compute Nodes: execute queries & send the result to leader node
No multi-AZ support (all the nodes will be in the same AZ)

Loading data into Redshift

S3
- Use COPY command to load data from an S3 bucket into Redshift
- Without Enhanced VPC Routing
  - data goes through the public internet
- Enhanced VPC Routing
  - data goes through the VPC without traversing the public internet
Kinesis Data Firehose
- Sends data to S3 and issues a COPY command to load it into Redshift
EC2 Instance
- Using JDBC driver
- Used when an application needs to write data to Redshift
- Optimal to write data in batches

Snapshots & DR

Snapshots are point-in-time backups of a cluster, stored internally in S3
Snapshots are incremental (only what has changed is saved)
You can restore a snapshot into a new cluster
Automated
- every 8 hours, every 5 GB, or on a schedule
- Set retention between 1 to 35 days
Manual
- snapshot is retained until you delete it
Feature to automatically copy snapshots into another region

Redshift Spectrum

Query data present in S3 without loading it into Redshift
Need to have a Redshift cluster to use this feature
Query is executed by 1000s of Redshift Spectrum nodes
Consumes much less of your cluster's processing capacity than other queries

OpenSearch

Amazon OpenSearch is successor to Amazon ElasticSearch
Used in combination with a database to perform search operations on the database
Can search on any field, even supports partial matches
Need to provision a cluster of instances (pay for provisioned instances)
Supports Multi-AZ
Used in Big Data
Security through Cognito & IAM, KMS encryption, TLS
Comes with Kibana (visualization) & Logstash (log ingestion)

EMR

EMR stands for “Elastic MapReduce”
EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
The clusters can be made of hundreds of EC2 instances
EMR comes bundled with Apache Spark, HBase, Presto, Flink…
EMR takes care of all the provisioning and configuration
Auto-scaling
Integrated with Spot Instances

QuickSight

Serverless machine learning-powered business intelligence service to create interactive dashboards
Fast, automatically scalable, embeddable
Use cases:
- Business analytics
- Building visualizations
- Get business insights using data
Integrated with :
- RDS
- Aurora
- Athena
- Redshift
- S3

Glue

Managed extract, transform, and load (ETL) service
Useful to prepare and transform data for analytics
Fully serverless service

Used to get data from a store, process and put it in another store (could be the same store)
Glue Job Bookmarks: prevent re-processing old data
Glue Data Crawlers crawl databases and collect metadata which is populated in Glue Data Catalog
Data lake is stored into S3

Lake Formation

Data lake = central place to have all your data for analytics purpose
Fully managed service that makes it easy to setup a data lake in days
Out-of-the-box source blueprints: S3, RDS, Relational & NoSQL DB…
Fine-grained Access Control for your applications (row and column-level)

Kinesis Data analytics

Kinesis Data Analytics (SQL application)

Real-time analytics on Kinesis Data Streams & Firehose using SQL
Add reference data from Amazon S3 to enrich streaming data
Fully managed, no servers to provision
Automatic scaling
Output
- Kinesis Data Streams: create streams out of the real-time analytics queries
- Kinesis Data Firehose: send analytics query results to destinations
Use cases:
- Time-series analytics
- Real-time dashboards
- Real-time metrics

Kinesis Data Analytics for Apache Flink

Use Flink (Java, Scala or SQL) to process and analyze streaming data
Use any Apache Flink programming features
Flink does not read from Firehose (use Kinesis Analytics for SQL instead)
SOurce :
- Kinesis Data Streams
- Amazon MSK

MSK Managed Streaming for Apache Kafka

Alternative to Amazon Kinesis both allow to stream data
Fully managed Apache Kafka on AWS
- Allow you to create, update, delete clusters
- MSK creates & manages Kafka brokers nodes & Zookeeper nodes for you
- Deploy the MSK cluster in your VPC, multi-AZ (up to 3 for HA)
- Data is stored on EBS volumes for as long as you want

MSK Serverless

Run Apache Kafka on MSK without managing the capacity
MSK automatically provisions resources and scales compute & storage

Big Data Ingestion Pipeline

We want the ingestion pipeline to be fully serverless
We want to collect data in real time
We want to transform the data
We want to query the transformed data using SQL
The reports created using the queries should be in S3
We want to load that data into a warehouse and create dashboards

Migration & Transfer

Snow Family

Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS
Offline devices to perform data migrations
If it takes more than a week to transfer over the network, use Snowball devices!

Device

Snowcone
- 2 CPUs, 4GB RAM, wired or wireless access
- 8 TB storage
- Good for space-constrained environment
- DataSync Agent is preinstalled
- When to use: Up to 24 TB, online and offline
Snowball Edge
- Compute Optimized
  - 52 vCPUs, 208 GB of RAM
  - 42 TB storage
  - Supports Storage Clustering
- Storage Optimized
  - Up to 40 CPUs, 80 GB of RAM
  - 80 TB storage
  - Supports Storage Clustering (up to 15 nodes)
  - Transfer up to petabytes
- When to use: Up to petabytes(PB)
Snowmobile
- 100 PB storage
- Used when transferring > 10PB
- Transfer up to exabytes
- Does not support Storage Clustering
- When to use: Up to exabytes(EB)

Edge Computing

Process data while it’s being created on an edge location (could be anything that doesn’t have internet or access to cloud)
Devices for edge computing:
- Snowcone
- Snowball Edge

Data migration

Physical data transport solution: move TBs or PBs of data in or out of AWS
Pay per data transfer job
Provide block storage and Amazon S3 compatible object storage
Snowball cannot import to Glacier directly (transfer to S3, configure a lifecycle policy to transition the data into Glacier)
Need to install OpsHub software on your computer to manage Snow Family devices

DataSync

DataSync is used primarily for one-time migrations
Agent Based : An agent needs to be installed on the on premise data center
Move large amount of data to and from
- On-premises / other cloud to AWS (NFS, SMB, HDFS, S3 API…) – needs agent
- AWS to AWS (different storage services) – no agent needed
- Can synchronize to:
  - Amazon S3 (any storage classes – including Glacier)
  - Amazon EFS
  - Amazon FSx (Windows, Lustre, NetApp, OpenZFS...)
- Replication tasks can be scheduled hourly, daily, weekly
- File permissions and metadata are preserved (NFS POSIX, SMB…)

Transfer Family

AWS managed service to transfer files in and out of Simple Storage Service (S3) or EFS using FTP (instead of using proprietary methods)
Supported Protocols
- FTP (File Transfer Protocol) - unencrypted in flight
- FTPS (File Transfer Protocol over SSL) - encrypted in flight
- SFTP (Secure File Transfer Protocol) - encrypted in flight
Supports Multi AZ
Pay per provisioned endpoint per hour + fee per GB data transfers
Clients can either connect directly to the FTP endpoint or optionally through Route 53
Transfer Family will need permission(IAM roleà)- to read or put data into S3 or EFS

Database Migration Service

Migrate entire databases from on-premises to AWS cloud
The source database remains available during migration
Continuous Data Replication using CDC (change data capture) of the target database
Replication type
- full load, all existing data is moved from sources to targets in parallel.
- full load plus captures changes to source tables during migration. CDC guarantees transactional integrity
- CDC only, Only replicate the data changes from the source database.
Requires EC2 instance running the DMS software to perform the replication tasks. If the amount of data is large, use a large instance. If multi-AZ is enabled, need an instance in each AZ.

Types of Migration

Homogeneous Migration

When the source and target DB engines are the same (eg. Oracle to Oracle)
One step process:
- Use the Database Migration Service (DMS) to migrate data from the source database to the target database

Heterogeneous Migration

When the source and target DB engines are different (eg. Microsoft SQL Server to Aurora)
Two step proces:
- Use the Schema Conversion Tool (SCT) to convert the source schema and code to match that of the target database
- Use the Database Migration Service (DMS) to migrate data from the source database to the target database

Migrating using Snow Family

Use the Schema Conversion Tool (SCT) to extract the data locally and move it to the Edge device
Ship the Edge device or devices back to AWS
After AWS receives your shipment, the Edge device automatically loads its data into an Amazon S3 bucket.
AWS DMS takes the files and migrates the data to the target data store (eg. DynamoDB)

Application migration service

AWS Application Discovery Service

Plan migration projects by gathering information about on-premises data centers
Server utilization data and dependency mapping are important for migrations
Two types of migration:
- Agentless Discovery Via (AWS Agentless Discovery Connector)
  - Agentless Discovery Connector within VMware vCenter
  - VM inventory, configuration, and performance history such as CPU, memory, and disk usage
- Agent-based Discovery Via (AWS Application Discovery Agent)
  - Install Application Discovery Agent on each VM and each physical server
  - System configuration, system performance, running processes, and details of the network connections between systems
Resulting data can be viewed within AWS Migration Hub

AWS Application Migration Service

Lift-and-shift (rehost) solution which simplify migrating applications to AWS
Converts your physical, virtual, and cloud-based servers to run natively on AWS
Supports wide range of platforms, Operating Systems, and databases
Minimal downtime, reduced costs

RDS and Aurora MySQL Migrations

RDS MySQL to Aurora MySQL

Option 1: DB Snapshots from RDS MySQL restored as MySQL Aurora DB
Option 2: Create an Aurora Read Replica from your RDS MySQL, and when the replication lag is 0, promote it as its own DB cluster (can take time and cost $)

External MySQL to Aurora MySQL

Option 1:
- Use Percona XtraBackup to create a file backup in Amazon S3
- Create an Aurora MySQL DB from Amazon S3
Option 2:
- Create an Aurora MySQL DB
- Use the mysqldump utility to migrate MySQL into Aurora (slower than S3 method)

Use DMS if both databases are up and running

Same process with PostgreSQL

Disaster Recovery

RPO and RTO

Any event that has a negative impact on a company’s business continuity or finances is a disaster
Recovery Point Objective (RPO): how often you backup your data (determines how much data are you willing to lose in case of a disaster)
Recovery Time Objective (RTO): how long it takes to recover from the disaster (down time)

Strategies

Backup & Restore
- High RPO (hours)
- Need to spin up instances and restore volumes from snapshots in case of disaster => High RTO
- Cheapest & easiest to manage
Pilot Light
- Critical parts of the app are always running in the cloud (eg. continuous replication of data to another region)
- Low RPO (minutes)
- Critical systems are already up => Low RTO
- Ideal when RPO should be in minutes and the solution should be inexpensive
- DB is critical so it is replicated continuously but EC2 instance is spin up only when a disaster strikes
Warm Standby
- A complete backup system is up and running at the minimum capacity. This system is quickly scaled to production capacity in case of a disaster.
- Very low RPO & RTO (minutes)
- Expensive
Multi-Site or Hot Site Approach
- A backup system is running at full production capacity and the request can be routed to either the main or the backup system.
- Multi-data center approach
- Lowest RPO & RTO (minutes or seconds)
- Very Expensive

Machine Learning

Rekognition

Find objects, people, text, scenes in images and videos using ML
Facial analysis and facial search to do user verification, people counting:
Use cases
- Content Moderation
- Text Detection
- Face Detection and Analysis (gender, age range, emotions…)
- Face Search and Verification
- Celebrity Recognition
- Detect content that is inappropriate, unwanted, or offensive (image and videos)
- Used in social media, broadcast media, advertising, and e-commerce situations to create a safer user experience

Transcribe

Automatically convert speech to text
Automatically remove Personally Identifiable Information (PII) using Redaction
Use cases:
- transcribe customer service calls
- automate closed captioning and subtitling

Polly

Turn text into lifelike speech using deep learning
Allowing you to create applications that talk

Translate

Natural and accurate language translation

Lex

Amazon Lex
- same technology that powers Alexa
- Automatic Speech Recognition (ASR) to convert speech to text
- Natural Language Understanding to recognize the intent of text, callers
- Helps build chatbots, call center bots
Amazon Connect
- Receive calls, create contact flows, cloud-based virtual contact center
- Can integrate with other CRM systems or AWS

Comprehend

For Natural Language Processing – NLP
Fully managed and serverless service
Uses machine learning to find insights and relationships in text

Comprehend Medical

Amazon Comprehend Medical detects and returns useful information in unstructured clinical text
Uses NLP to detect Protected Health Information (PHI)

SageMaker

Fully managed service for developers / data scientists to build ML models
Typically, difficult to do all the processes in one place + provision servers
Machine learning process (simplified): predicting your exam score

Forecast

Fully managed service that uses ML to deliver highly accurate forecasts
Example: predict the future sales of a raincoat
Use cases: Product Demand Planning, Financial Planning, Resource Planning

Kendra

Fully managed document search service powered by Machine Learning
Extract answers from within a document (text, pdf, HTML, PowerPoint, MS Word, FAQs…)

Personalize

Fully managed ML-service to build apps with real-time personalized recommendations
Same technology used by Amazon.com
Integrates into existing websites, applications, SMS, email marketing systems, …

Textract

Automatically extracts text, handwriting, and data from any scanned documents using AI and ML
Read and process any type of document (PDFs, images, …)

Networking

Route 53

A highly available, scalable, fully managed and Authoritative DNS(cusutmer can update DNS records)
Route 53 is also a Domain Registrar
Ability to check the health of your resources
The only AWS service which provides 100% availability SLA

Hosted Zone

A container for records that define how to route traffic to a domain and its subdomains
Hosted zone is queried to get the IP address from the hostname

Two types
- Public Hosted Zone
  - resolves public domain names
  - can be queried by anyone on the internet
- Private Hosted Zone
  - resolves private domain names
  - can only be queried from within the VPC

Record Types

Each record contains:

Domain/subdomain Name – e.g., example.com
Record Type – e.g., A or AAAA
Value – e.g., 12.34.56.78
Routing Policy – how Route 53 responds to queries
TTL – amount of time the record cached at DNS Resolvers
A – maps a hostname to IPv4
AAAA – maps a hostname to IPv6
CNAME – maps a hostname to another hostname
- The target is a domain name which must have an A or AAAA record
- Cannot point to root domains (Zone Apex) Ex: you can’t create a CNAME record for example.com, but you can create for something.example.com
NS (Name Servers) - controls how traffic is routed for a domain
Alias - maps a hostname to an AWS resource(app.mydomain.com => blabla.amazonaws.com)
- Native health check
- AWS proprietary
- Can point to root (zone apex) and non-root domains
- Alias Record is of type A or AAAA (IPv4 / IPv6)
- Automatically recognizes changes in the resource’s IP addresses
- You can’t set the TTL
- Targets can be:
  - Elastic Load Balancers
  - CloudFront Distributions
  - API Gateway
  - Elastic Beanstalk environments
  - S3 Websites
  - VPC Interface Endpoints
  - Global Accelerator accelerator
- Target cannot be an EC2 DNS name
Routing Policies

Define how Route 53 responds to DNS queries

Simple
- Route to one or more resources
- If multiple values are returned, client chooses one at random
- No health check (if returning multiple resources, some of them might be unhealthy)
- When Alias enabled, you can only specify one Aws resource as a target

Weighted

Control the % of the requests that go to each specific resource
Can be associated with Health Checks
Use cases: load balancing between regions, testing, new application versions…

Failover(Active-Passive)

Primary & Secondary Records (if the primary application is down, route to secondary application)
Health check must be associated with the primary record, you can also associate health check to secondary
Used for Active-Passive failover strategy

Latency-based

Redirect to the resource that has the lowest network latency
Latency is based on traffic between users and AWS Regions
Can be associated with Health Checks (has a failover capability)

Geolocation

Routing based on the client's location
Specify location by Continent, Country
Should create a “Default” record (in case there’s no match on location)
Use cases: restrict content distribution & language preference
Can be associated with Health Checks

Geoproximity

Route traffic to your resources based on the geographic location of users and resources
Ability to shift more or less traffic to resources based on the defined bias
To change the size of the geographic region, specify bias values:
- To expand (1 to 99) – more traffic to the resource
- To shrink (-1 to -99) – less traffic to the resource
To use geoproximity routing you must use Route 53 Traffic Flow

Multi-value

Route traffic to multiple resources (max 8)
Health Checks (only healthy resources will be returned)
__Multi-value is not subsitute for having an ELB, it the client side load balancing
At difference of simple routing all response returned are healthy

Health Checks

HTTP Health Checks are only for public resources
Automated for Automated DNS Failover
Three types:
- Monitor an endpoint (application or other AWS resource)
  - Multiple global health checkers check the endpoint health
  - Must configure the application firewall to allow incoming requests from the IPs of Route 53 Health Checkers
  - Supported protocols: HTTP, HTTPS and TCP
- Monitor other health checks (Calculated Health Checks)
  - Combine the results of multiple Health Checks into one (AND, OR, NOT)
  - Specify how many of the health checks need to pass to make the parent pass
  - Usage: perform maintenance to your website without causing all health checks to fail
- Monitor CloudWatch Alarms (to perform health check on private resources(Private Hosted Zone ))
  - Route 53 health checkers are outside the VPC. They can’t access private endpoints (private VPC or on-premises resources).
  - Create a CloudWatch Metric and associate a CloudWatch Alarm to it, then create a Health Check that checks the Cloud watch alarm.

API Gateway

Serverless REST APIs
Invoke Lambda functions using REST APIs (API gateway will proxy the request to lambda)
Supports WebSocket (stateful)
Cache API responses
Can be integrated with any HTTP endpoint in the backend or any AWS API

API Gateway – Integrations

Lambda Function

Invoke Lambda function
Easy way to expose REST API backed by AWS Lambda HTTP
Expose HTTP endpoints in the backend
Example: internal HTTP API on premise, Application Load Balancer
Why? Add rate limiting, caching, user authentications, API keys, etc AWS Service
Expose any AWS API through the API Gateway
Example: start an AWS Step Function workflow, post a message to SQS
Why? Add authentication, deploy publicly, rate control

Endpoint Types

Edge-Optimized (default):
- For global clients
- Requests are routed through the CloudFront edge locations (improves latency)
- The API Gateway lives in only one region but it is accessible efficiently through edge locations
Regional
- For clients within the same region
- Could manually combine with your own CloudFront distribution for global deployment (this way you will have more control over the caching strategies and the distribution)
Private
- Can only be accessed within your VPC using an Interface VPC endpoint (ENI)
- Use resource policy to define access

Security

User Authentication through
- IAM Roles (useful for internal applications)
- Cognito (identity for external users – example mobile users)
- Custom Authorizer( Using lambda function to validate the token being passed in the header and return an lAM policy to determine if the user should be allowed to access the resource )
Custom Domain Name HTTPS security through integration with AWS Certificate Manager (ACM)

VPC

VPC = Virtual Private Cloud
You can have multiple VPCs in an AWS region (max. 5 per region – soft limit)
Because VPC is private, only the Private IPv4 ranges are allowed:
- 10.0.0.0 – 10.255.255.255 (10.0.0.0/8)
- 172.16.0.0 – 172.31.255.255 (172.16.0.0/12)
- 192.168.0.0 – 192.168.255.255 (192.168.0.0/16)
Max. CIDR per VPC is 5, for each CIDR
- Min. size is /28 (16 IP addresses)
- Max. size is /16 (65536 IP addresses)

VPC – Subnet

AWS reserves 5 IP addresses (first 4 & last 1) in each subnet

Internet Gateway (IGW)

Allows resources (e.g., EC2 instances) in a VPC connect to the Internet
It scales horizontally and is highly available and redundant
One VPC can only be attached to one IGW and vice versa
Internet Gateways on their own do not allow Internet access
Route tables must also be edited!

Bastion Hosts

A EC2 instance running in the public subnet (accessible from public internet), to allow users to SSH into the instances in the private subnet.
Bastion Host security group must allow inbound from the internet on port 22 from restricted CIDR, for example the public CIDR of your corporation
Security Group of the EC2 Instance must allow the Security Group of the Bastion Host, or the private IP of the Bastion host

NAT Instance

NAT for Network Address Translation
Allows EC2 instances in private subnets to connect to the Internet
Must be launched in a public subnet
Must disable EC2 setting: Source /destination Check because NAT instance forward traffic that does not belong to him
Must have Elastic IP attached to it
Route Tables must be configured to route traffic from private subnets to the NAT Instance
Can be used as a Bastion Host
Disadvantages:
- Not highly available or resilient out of the box. Need to create an ASG in multi-AZ + resilient user-data script
- Internet traffic bandwidth depends on EC2 instance type

NAT Gateway

AWS-managed NAT, higher bandwidth, high availability, no administration
Pay per hour for usage and bandwidth
Preferred over NAT instances
NATGW is created in a specific Availability Zone, uses an Elastic IP
Can’t be used by EC2 instance in the same subnet (only from other subnets)
Can't be shared accross VPCs(available only in one VPC)
Requires an IGW (Private Subnet => NATGW => IGW)
Created in a public subnet
5 Gbps of bandwidth with automatic scaling up to 45 Gbps
No Security Groups to manage / required
Route Tables for private subnets must be configured to route internet-destined traffic to the NAT gateway

Architecture

NAT Gateway with High Availability

NAT Gateway is resilient within a single Availability Zone
Must create multiple NAT Gateways in multiple AZs for fault-tolerance
No cross-AZ failover needed because if an AZ goes down, all of the instances in that AZ also go down.

Network Access Control List (NACL)

NACL are like a firewall which control traffic from and to subnets
One NACL per subnet but a NACL can be attached to multiple subnets
New subnets are assigned the Default NACL
Default NACL allows all inbound & outbound requests
New NACL rule By default deny all inbound and outbound traffic until you add rules
* All Traffic Deny This rule ensures that if a packet doesn't match any of the other numbered rules, it's denied. You can't modify or remove this rule.
NACL Rules:
- Based only on IP addresses
- Rules number: 1-32766 (lower number has higher precedence)
- First rule match will drive the decision
- The last rule denies the request (only when no previous rule matches)
- Each subnet in your VPC must be associated with a network ACL. If you don't explicitly associate a subnet with a network ACL, the subnet is automatically associated with the default network ACL.

NACL vs Security Group

NACL
- Firewall for subnets
- Supports both Allow and Deny rules
- Stateless (both request and response will be evaluated against the NACL rules)
Security Group
- Firewall for EC2 instances
- Supports only Allow rules
- Stateful return traffic is automatically allowed,regardless of any rules

NACL with Ephemeral Ports

For any two endpoints to establish a connection, they must use ports
Clients connect to a defined port, and expect a response on an ephemeral port
In the example below, the client EC2 instance needs to connect to DB instance. Since the ephemeral port can be randomly assigned from a range of ports, the Web Subnets’s NACL must allow inbound traffic from that range of ports and similarly DB Subnet’s NACL must allow outbound traffic on the same range of ports.

VPC Peering

Privately connect two VPCs using AWS network
Must not have overlapping CIDRs
VPC Peering connection is NOT transitive
Must update route tables in each VPC’s subnets to ensure requests destined to the peered VPC can be routed through the peering connection
You can create VPC Peering connection between VPCs in different AWS accounts/regions(cross account or cross region)
You can reference a security group in a peered VPC (works cross accounts – same region)

VPC Endpoints

Every AWS service is publicly exposed (public URL)
VPC Endpoints (powered by AWS PrivateLink) allows you to connect to AWS services using a private network instead of using the public Internet
They’re redundant and scale horizontally
They remove the need of IGW, NATGW, … to access AWS Services
Types of Endpoints :
- Interface Endpoints (powered by PrivateLink)
  - Provisions an ENI (private IP address) as an entry point
  - Need to attach a security group to the interface endpoint to control access
  - Supports most AWS services
  - No need to update the route table
  - $ per hour + $ per GB of data processed
- Gateway Endpoint
  - Provisions a gateway
  - Must be used as a target in a route table
  - Supports only S3 and DynamoDB
  - Free

VPC Flow Logs

Captures information about IP traffic going into your interfaces
Three levels:
- VPC Flow Logs
- Subnet Flow Logs
- ENI Flow Logs
Can be configured to show accepted, rejected or all traffic
Flow logs data can be sent to S3 (bulk analytics) or CloudWatch Logs (near real-time via metric filter)
Query VPC flow logs using Athena in S3 or CloudWatch Logs Insights

IPv6 Support

IPv4 cannot be disabled for your VPC
Enable IPv6 to operate in dual-stack mode in which your EC2 instances will get at least a private IPv4 and a public IPv6. They can communicate using either IPv4 or IPv6 to the internet through an Internet Gateway.
If you cannot launch an EC2 instance in your subnet, It’s not because it cannot acquire an IPv6 (the space is very large). It’s because there are no available IPv4 in your subnet.
Solution: Create a new IPv4 CIDR in your subnet

Egress-only Internet Gateway

Allows instances in your VPC to initiate outbound connections over IPv6 while preventing inbound IPv6 connections to your private instances.
Similar to NAT Gateway but for IPv6
Must update Route Tables

PrivateLink

To open our applications up to other VPCs, we can either:

Open the VPC up to the Internet
- Security considerations; everything in the public subnet is public
- A lot more to manage
Use VPC Peering
- You will have to create and manage many different peering relationships
PrivateLink The best way to expose a service VPC to tens, hundreds, or thousands of customer VPCs
Doesn’t require VPC peering; no route tables, NAT gateways, internet gateways, etc
Requires a Network Load Balancer on the service VPC and an ENI on the customer VPC

Site-to-Site VPN

Easiest and most cost-effective way to connect a VPC to an on-premise data center
IPSec Encrypted connection through the public internet
Virtual Private Gateway (VGW): VPN concentrator on the VPC side of the VPN connection
Customer Gateway (CGW): Software application or physical device on customer side of the VPN connection
Enable Route Propagation for the Virtual Private Gateway in the route table that is associated with your subnets
If you need to ping EC2 instances from on-premises, make sure you add the ICMP protocol on the inbound rules of your security groups

AWS VPN CloudHub

Low-cost hub-and-spoke model for network connectivity between a VPC and multiple on-premise data centers(VPN only)
Every participating network can communicate with one another through the VPN connection
It operates over the public internet, but all traffic between the customer gateway and the AWS VPN CloudHub is encrypted.
To set it up, connect multiple VPN connections on the same VGW, setup dynamic routing and configure route tables

Direct connect

Dedicated private connection from an on-premise data center to a VPC
Dedicated connection must be setup between your Data Center and AWS Direct Connect locations
You need to setup a Virtual Private Gateway on your VPC
Data in transit is not-encrypted but the connection is private (secure)
More stable and secure than Site-to-Site VPN
Access public & private resources on the same connection using Public & Private Virtual Interface (VIF) respectively
DIRECT CONNECT IS:
- Fast
- Secure
- Reliable
- Able to take massive throughput
- Lower cost

Direct Connect Gateway

Used to setup a Direct Connect to multiple VPCs from your data center, possibly in different regions but same account
Using DX, we will create a Private VIF to the Direct Connect Gateway which will extend the VIF to Virtual Private Gateways in multiple VPCs (possibly across regions).

Connection Types

Dedicated Connection
- A physical Ethernet connection associated with a single customer.
- 1Gbps,10 Gbps and 100 Gbps capacity
Hosted Connection
- A physical Ethernet connection that an AWS Direct Connect Partner provisions on behalf of acustomer
- 50Mbps, 500 Mbps, to 10 Gbps

Encryption

For encryption in flight, use AWS Direct Connect + VPN which provides an IPsec-encrypted private connection
Good for an extra level of security

Resiliency

Best way (redundant direct connect connections)

VPN connection as a backup
- In case Direct Connect fails, you can set up a backup Direct Connect connection (expensive), or a Site-to-Site VPN connection

Transit Gateway

Transitive peering between thousands of VPCs and on-premise data centers using hub-and-spoke (star) topology
Works with Direct Connect Gateway, VPN Connection
Regional resource, can work cross-region
You can peer Transit Gateways across regions
Route Tables to control communication within the transitive network
Supports IP Multicast (not supported by any other AWS service)

Increasing BW of Site-to-Site VPN connection

ECMP = Equal-cost multi-path routing
To increase the bandwidth of the connection between Transit Gateway and corporate data center, create multiple site-to-site VPN connections, each with 2 tunnels (2 x 1.25 = 2.5 Gbps per VPN connection).

Share Direct Connect between multiple accounts

Share Transit Gateway across accounts using Resource Access Manager (RAM) connection between VPCs in the same region but different accounts

Networking Costs

Use Private IP instead of Public IP for good savings and better network performance
Use same AZ for maximum savings (at the cost of high availability)
Traffic entering the AWS is free
Traffic leaving an AWS region is paid

Minimizing egress traffic network cost

Egress traffic: outbound traffic (from AWS to outside)
Ingress traffic: inbound traffic - from outside to AWS (typically free)
Try to keep as much internet traffic within AWS to minimize costs
Direct Connect location that are co-located in the same AWS Region result in lower cost for egress network

Content delivery

CloudFront

Content Delivery Network (CDN)
Improves read performance, content is cached at the edge
Improves users experience
216 Point of Presence globally (edgelocations)

Origins

S3 bucket
- For distributing files and caching them at the edge
- Enhanced security with CloudFront Origin Access Control (OAC)
- Origin Access Identity (OAl old version) or Origin Access Control (OAC new version) allows the S3 bucket to only be accessed by CloudFront
- CloudFront can be used as an ingress (to upload files to S3)
Custom Origin (HTTP)
- Application Load Balancer
- EC2 instance
- S3 website (must first enable the bucket as a static S3 website)
- Any HTTP backend you want

CloudFront Geo Restriction

You can restrict who can access your distribution
- Allowlist :
  - Allow your users to access your content only if they're in one of the countries on a list of approved countries.
- Blocklist
  - Prevent your users from accessing your content if they're in one of the countries on a list of banned countries.
The “country” is determined using a 3rd party Geo-IP database
Use case: Copyright Laws to control access to content

Signed URL / Cookies

Used to make a CloudFront distribution private (distribute to a subset of users)
Signed URL ⇒ access to individual files
Signed Cookies ⇒ access to multiple files
Whenever we create a signed URL / cookie, we attach a policy specifying:
- URL / Cookie Expiration (TTL)
- IP ranges allowed to access the data
- Trusted signers (which AWS accounts can create signed URLs)

Pricing

Price Class All: all regions (best performance)
Price Class 200: most regions (excludes the most expensive regions)
Price Class 100: only the least expensive regions

Global Accelerator

Leverage the AWS internal network to route to your application
2 Anycast IP are created for your application
The Anycast IP send traffic directly to Edge Locations
The Edge locations send the traffic to your application
Endpoint could be public or private (could span multiple region):
- Elastic IP
- EC2 instances
- ALB
- NLB
Disaster Recovery
- Global Accelerator performs health checks for the application
- Failover in less than 1 minute for unhealthy endpoints
Good for:
- HTTP use cases that require static IP addresses or fast regional failover

Monitoring & Audit

CloudWatch

Serverless performance monitoring service

Metrics

CloudWatch provides metrics for every services in AWS
Metric is a variable to monitor (CPUUtilization, NetworkIn…)
Metrics belong to namespaces
Dimension is an attribute of a metric (instance id, environment, etc…)
Up to 10 dimensions per metric
2 Type
- Default metric
  - These metrics are provided out of the box and do not require any additional work on your part to configure
- Custom
  - These metrics will need to be provided by using the CloudWatch agent installed on the host.

EC2 Monitoring

Must run a CloudWatch agent on instance to push system metrics and logs to CloudWatch.
Instance role (IAM) must allow the instance to push logs to CloudWatch
EC2 instances have metrics every 5 minutes
With detailed monitoring (for a cost), you get metrics every 1 minute
Use detailed monitoring if you want to react faster to changes (eg. scale faster for your ASG)
Available metrics in CloudWatch:
- CPU Utilization
- Network Utilization
- Disk Performance
- Disk Reads/Writes
Custom metrics
- Memory utilization (memory usage)
- Disk swap utilization
- Disk space utilization

CloudWatch Logs Agent

Old version of the agent
Can only send to CloudWatch Logs

CloudWatch Unified Agent

Collect additional system-level metrics such as RAM, processes, etc…
Collect logs to send to CloudWatch Logs
Centralized configuration using SSM Parameter Store
Collected directly on your Linux server / EC2 instance
- CPU (active, guest, idle, system, user, steal)
- Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)
- RAM (free, inactive, used, total, cached)
- Netstat (number of TCP and UDP connections, net packets, bytes)
- Processes (total, dead, bloqued, idle, running, sleep)
- Swap Space (free, used, used %)

Logs

Used to store application logs
Log Event This is the record of what happened. It contains a timestamp and the data.
Log Stream A collection of log events from the same source create a log stream.Think of one continuous set of logs from a single instance
Log Group This is a collection of log streams. For example, you’d group all your Apache web server logs across hosts together.
Logs can be sent to:
- S3 buckets (exports)
- Kinesis Data Streams
- Kinesis Data Firehose
- Lambda functions
- ElasticSearch

Metric Filters can be used to filter expressions and use the count to trigger CloudWatch alarms. They apply only on the incoming metrics after the metric filter was created. Example filters:

find a specific IP in the logs
count occurrences of “ERROR” in the logs

Cloud Watch Logs Insights can be used to query(Sql) logs and add queries to CloudWatch Dashboards

Subscription Filter

To stream logs in real-time, apply a Subscription Filter on logs
Logs can take up to 12 hours to become available for exporting to S3 (not real-time)
To store logs in real time in S3, use a subscription filter to publish logs to KDF in real time which will then write the logs to S3.
Logs from multiple accounts and regions can be aggregated using subscription filters

Alarms

Alarms are used to trigger notifications for any metric
Various options (sampling, %, max, min, etc…)
Alarm States: OK, INSUFFICIENT_DATA, ALARM
Alarm Targets:
- Stop, Terminate, Reboot, or Recover an EC2 Instance
- Trigger Auto Scaling Action
- Send notification to SNS

CloudWatch Insights and Operational Visibility

CloudWatch Container Insights
- Collect, aggregate, summarize metrics and logs from containers
- In Amazon EKS and Kubernetes, CloudWatch Insights is using a containerized version of the CloudWatch Agent to discover containers
CloudWatch Lambda Insights
- Monitoring and troubleshooting solution for serverless applications running on AWS Lambda
- Collects, aggregates, and summarizes system level metrics including CPU time, memory, disk, and network
CloudWatch Contributors Insights
- Find “Top-N” Contributors through CloudWatch Logs
CloudWatch Application Insights
- Automatic dashboard to troubleshoot your application and related AWS services

CloudTrail

Provides governance, compliance and audit for your AWS Account
CloudTrail is enabled by default!
Get an history of events / API calls made within your AWS Account
Can put logs from CloudTrail into CloudWatch Logs or S3
A trail can be applied to All Regions (default) or a single Region
If a resource is deleted in AWS, investigate CloudTrail first
Event retention: 90 days
To keep events beyond this period, log them to S3 and use Athena

CloudTrail Events

Management Events
- Events of operations that modify AWS resources :
  - Creating a new IAM user
  - Deleting a subnet
- Enabled by default
- Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
Data Events
- By default, data events are not logged (because high volume operations)
- Events of operations that modify data:
  - Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject)
  - AWS Lambda function execution activity (the Invoke API)
CloudTrail Insights Events
- Enable CloudTrail Insights to detect unusual activity in your account
  - inaccurate resource provisioning
  - hitting service limits
  - Bursts of AWS IAM actions
- CloudTrail Insights analyzes normal management events to create a baseline and then continuously analyzes write events to detect unusual patterns. If that happens, CloudTrail generates insight events that
  - show anomalies in the Cloud Trail console
  - can can be logged to S3
  - can trigger an EventBridge event for automation

Config

Helps with auditing and recording compliance of your AWS resources
Record configurations changes over time
Evaluate compliance of resources using config rules
Does not prevent non-compliant actions from happening (no deny)
Questions that can be solved by AWS Config:
- Is there unrestricted SSH access to my security groups?
- Do my buckets have any public access?
- How has my ALB configuration changed over time?
You can receive alerts (SNS notifications) for any changes
Remediation
- automate remediation of non-compliant resources using SSM Automation Documents
  - AWS-Managed Automation Documents
  - Custom Automation Documents to invoke a Lambda function for automation
- You can set Remediation Retries if the resource is still non-compliant after auto remediation

CloudWatch vs CloudTrail vs Config

CloudWatch
- Performance monitoring (metrics, CPU, network, etc…) & dashboards
- Events & Alerting
- Log Aggregation & Analysis
CloudTrail
- Record API calls made within your Account by everyone
- Can define trails for specific resources
- Global Service
Config
- Record configuration changes
- Evaluate resources against compliance rules
- Get timeline of changes and compliance

Trusted Advisor

Analyze your AWS accounts and provides recommendation:
- Cost Optimization
  - low utilization EC2 instances, EBS volumes, idle load balancers, etc.
  - Reserved instances & savings plans optimizations
- Performance
  - High utilization EC2 instances, CloudFront CDN optimizations
  - EC2 to EBS throughput optimizations, Alias records recommendations
- Security
  - MFA enabled on Root Account, IAM key rotation, exposed Access Keys
  - S3 Bucket Permissions for public access, security groups with unrestricted ports
- Fault Tolerence
  - EBS snapshots age, Availability Zone Balance
  - ASG Multi-AZ, RDS Multi-AZ, ELB configuration, etc
- Service limit
  - whether or not you are reaching the service limit for a service and suggest you to increase the limit beforehand

Cost Explorer

Visualize, understand, and manage your AWS costs and usage over time
Create custom reports that analyze cost and usage data.
Analyze your data at a high level: total costs and usage across all accounts
Forecast usage up to 12 months based on previous usage

Access Management

Identity Access Management

Groups are collections of users and have policies attached to them
User can belong to multiple groups
You should log in as an IAM user with admin access even if you have root access.

Policies

Policies are JSON documents that outline permissions for users, groups or roles
Two types:
- User based policies
  - IAM policies define which API calls should be allowed for a specific user
- Resource based policies
  - Control access to an AWS resource
  - Grant the specified principal permission to perform actions on the resource and define under what conditions this applies
An IAM principal can access a resource if the user policy ALLOWS it OR the resource policy ALLOWS it AND there’s no explicit DENY.

Roles

Some AWS service will need to perform actions on your behalf
To do so, we will assign permissions to AWS services with IAM Roles

Reporting Tools

Credentials Report : lists all the users and the status of their credentials (MFA, password rotation, etc.)
Access Advisor : shows the service permissions granted to a user and when those services were last accessed

Assume Role vs Resource-based Policy

When you assume an IAM Role, you give up your original permissions and take the permissions assigned to the role
When using a resource based policy, the principal doesn’t have to give up their permissions
Kinesis data stream use IAM role
SNS, SQS, Lambda, CloudWatch log, API Gateway ... are Ressource based policy

Permission Boundaries

Set the maximum permissions an IAM entity can get
Can be applied to users and roles (not groups)
Used to ensure some users can’t escalate their privileges (make themselves admin)

AWS Organizations

Global service
Manage multiple AWS accounts under an organization
- The main account is the management account
- Other accounts are member accounts
An AWS account can only be part of one organization
Consolidated Billing across all accounts (lower cost)
Pricing benefits from aggregated usage of AWS resources
API to automate AWS account creation (on demand account creation)
Establish Cross Account Roles for Admin purposes where the master account can assume an admin role in any of the children accounts

Organizational Units (OU)

Folders for grouping AWS accounts of an organization
Can be nested

Service Control Policies (SCP)

IAM policies applied to OU or Accounts to restrict Users and Roles
They do not apply to the management account (full admin power)
Must have an explicit allow (does not allow anything by default – like IAM)
Explicit Deny has the highest precedence

Sharing Resources with AWS RAM

AWS Resource Access Manager (RAM) is a free service that allows you to share resources with other accounts AWS and within your organization
AWS RAM allow you to easily share resources rather than having to create duplicate copies in your different accounts.
Resources are :
- VPC subnets
- Transit Gateway
- Route 53 Resolver
- Licence Manager
- Dedicated Host
- etc ...

RAM vs. VPC Peering

When should you use VPC peering or RAM?
Are you sharing resources within the same region? Use RAM.
Are you sharing across regions? Use VPC peering.
If RAM isn’t available and VPC peering is, that’s still a great option!

Cross Account Role Access

As the number of AWS accounts you manage increases, you will need to set up cross-account access.
Duplicating IAM accounts creates a security vulnerability.
Cross-account role access gives you the ability to set up temporary access(assume role) you can easily control.

SSO

For Single Sign-On called now IAM Identity Center
One login (single sign-on) for all your
- AWS accounts in AWS Organizations
- Business cloud applications (e.g., Salesforce, Box, Microsoft 365, …)
- SAML2.0-enabled applications
Identity providers
- Built-in identity store in IAM Identity Center
- 3rd party: Active Directory (AD), OneLogin, Okta

Cognito

Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile apps quickly and easily. Amazon Cognito scales to millions of users and supports sign-in with social identity providers, such as Apple, Facebook, Google, and Amazon, and enterprise identity providers via SAML 2.0 and OpenID Connect.

Cognito User Pools (CUP)

Users pools are directories of users that provide sign-up and sign-in options for your application users
Create a serverless database of user for your web & mobile apps
Integrate with API Gateway & Application Load Balancer
Multi-factor authentication (MFA)
Federated Identities: users from Facebook, Google, SAML…

Cognito Identity Pools (Federated Identity)

Provide AWS credentials to users so they can access AWS resources directly
Provides temporary credentials (using STS) to users so they can access AWS resources
Integrate with Cognito User Pools as an identity provider
Example use case: provide temporary access to write to an S3 bucket after authenticating the user via FaceBook (using CUP identity federation)

Cognito vs IAM: “hundreds of users”, ”mobile users”, “authenticate with SAML”

AWS Directory Services

Managed Microsoft AD

This is the entire AD suite
You can easily build out AD in AWS
Login credentials are shared between on-premise and AWS managed AD
Manage users on both AD (on-premise and on AWS managed AD)
Establish “trust” connections with your on- premises AD
Supports MFA

AD Connector

Creates a tunel between AWS and your on premises AD
Directory Gateway (proxy) to redirect to on- premises AD, supports MFA
Users are managed on the on-premises AD

Simple AD

provides a subset of the features offered by AWS Managed Microsoft AD, including the ability to manage user accounts and group memberships, create and apply group policies, securely connect to Amazon EC2 instances, and provide Kerberos-based single sign-on (SSO).
Standalone directory powered by Linux Samba Active Directory-compatible server
AD-compatible managed directory on AWS
Cannot be joined with on-premises AD

AWS Control Tower

Easy way to set up and govern a secure and compliant multi-account AWS environment based on best practices
AWS Control Tower uses AWS Organizations to create accounts
Benefits:
- Automate the set up of your environment in a few clicks
- Automate ongoing policy management using guardrails
- Detect policy violations and remediate them
- Monitor compliance through an interactive dashboard
Guardrails
- Provides ongoing governance for your Control Tower environment (AWS Accounts)
- Preventive Guardrail
  - Ensures accounts maintain governance by disallowing violating actions
  - Leverages service control policies
  - using SCPs (e.g., Restrict Regions across all your accounts)
- Detective Guardrail
  - Detects and alerts on noncompliant resources within all accounts
  - Leverages AWS Config rules – Using AWS Config (e.g., identify untagged resources)

Features and Terms to Know

Landing zone: Well-architected, multi-account environment based on compliance and security best practices
Guardrails: High-level rules providing continuous governance for the AWS environment
Account Factory: Configurable account template for standardizing pre-approved configs of new accounts
CloudFormation StackSet: Automated deployments of templates deploying repeated resources for governance
Shared accounts: Three accounts used by Control Tower created during landing zone creation

Parameters & Encryption

Key Management Service

Anytime you hear “encryption” for an AWS service, it’s most likely KMS
Regional service (keys are bound to a region)
AWS manages encryption keys for us
Provides encryption and decryption of data and manages keys required for it
Encrypted secrets can be stored in the code or environment variables
Encrypt up to 4KB of data per call (if data > 4 KB, use envelope encryption)
Integrated with lAM for authorization
Audit key usage with CloudTrail(to know who made call to KMS API)
Need to set IAM Policy & Key Policy to allow a user or role to access a KM

KMS Keys

KMS Keys is the new name of KMS Customer Master Key
Symmetric (AES-256 keys)
- Single encryption key that is used to Encrypt and Decrypt
- AWS services that are integrated with KMS use Symmetric CMKs
- You never get access to the KMS Key unencrypted (must call KMS API to use)
Asymmetric (RSA & ECC key pairs)
- Public (Encrypt) and Private Key (Decrypt) pair
- Used for Encrypt/Decrypt, or Sign/Verify operations
- The public key is downloadable, but you can’t access the Private Key unencrypted
- Use case: encryption outside of AWS by users who can’t call the KMS API

Three types of KMS Keys

AWS Owned Keys (free): SSE-S3, SSE-SQS, SSE-DDB (default key)
AWS Managed Key: free (aws/service-name, example: aws/rds or aws/ebs)
Customer managed keys created in KMS: $1 / month
Customer managed keys imported (must be symmetric key): $1 / month + pay for API call to KMS ($0.03 / 10000 calls)

Key Rotation

Automatic
- AWS-managed KMS Key
  - automatic every 1 year
- Customer-managed KMS Key
  - must be enabled
  - automatic every 1 year
Manual
- Imported KMS Key
  - only manual rotation possible using alias

Key Policies

Control access to KMS keys, “similar” to S3 bucket policies
Cannot access KMS keys without a key policy
Default Key Policy
- Created if you don’t provide a specific Key Policy
- The default allow every one in your account to access the key
Custom KMS Key Policy
- Define users, roles that can access the KMS key
- Define who can administer the key
- Useful for cross-account access of your KMS key

Cross-region Encrypted Snapshot Migration

Copy the snapshot to another region with re-encryption option using a new key in the new region (keys are bound to a region)

Cross-account Encrypted Snapshot Migration

Create a Snapshot, encrypted with your own KMS Key (Customer Managed Key)
Attach a KMS Key Policy to authorize cross-account access
Share the encrypted snapshot
(in target) Create a copy of the Snapshot, encrypt it with a new CMK in your account
Create a volume from the snapshot

KMS Multi-Region Keys

Identical KMS keys in different AWS Regions that can be used interchangeably
Multi-Region keys have the same key ID, key material, automatic rotation
Encrypt in one Region and decrypt in other Regions
No need to re-encrypt or making cross-Region API calls
KMS Multi-Region are NOT global (Primary + Replicas)
Each Multi-Region key is managed independently
Use cases: global client-side encryption, encryption on Global DynamoDB, Global Aurora

AMI Sharing Process Encrypted via KMS

AMI in Source Account is encrypted with KMS Key from Source Account
Must modify the image attribute to add a Launch Permission which corresponds to the specified target AWS account
Must share the KMS Keys used to encrypted the snapshot the AMI references with the target account / IAM Role
The IAM Role/User in the target account must have the permissions to DescribeKey, ReEncrypted, CreateGrant, Decrypt
When launching an EC2 instance from the AMI, optionally the target account can specify a new KMS key in its own account to re-encrypt the volumes

CloudHSM

A hardware security module(HSM)is a physical computing device that safeguards and manages digital keys and performs encryption and decryption functions.
An HSM contains one or more secure cryptoprocessor chips
AWS provisions dedicated encryption hardware (Hardware Security Module)
Use when you want to manage encryption keys completely
HSM device is stored in AWS (tamper resistant, FIPS 140-2 Level 3 compliance)
Supports both symmetric and asymmetric encryption
Good option to use with SSE-C encryption
CloudHSM clusters are spread across Multi AZ (high availability)
IAM permissions are required to perform CRUD operations on HSM cluster
CloudHSM Software is used to manage the keys and users (in KMS, everything is managed using IAM)

SSM Parameter Store

Secure storage for configuration and secrets
Optional Seamless Encryption using KMS
Serverless, scalable, durable, easy SDK
Security through IAM
Notifications with Amazon EventBridge
Integration with CloudFormation
Difference with secret manager:
- SSM Parameter store is free, secret manager is not
- Limit to the number of parameters you can store(10000)
- No key rotation

Parameter Tiers

Parameter Policies

Only supported in advanced tie
Assign policies to a parameter for additional features
- Expire the parameter after some time (TTL)
- Parameter expiration notification
- Parameter change notification

Secrets Manager

Newer service, meant for storing secrets
Capability to force rotation of secrets every X days(not available in Parameter Store)
Automate generation of secrets on rotation (uses Lambda)
Secrets are encrypted using KMS
Mostly used for RDS(MySQL, PostgreSQL, Aurora) authentication
- need to specify the username and password to access the database
- link the secret to the database to allow for automatic rotation of database login info

Secrets Manager – Multi-Region Secrets

Replicate Secrets across multiple AWS Regions
Secrets Manager keeps read replicas in sync with the primary Secret
Ability to promote a read replica Secret to a standalone Secret
Use cases: multi-region apps, disaster recovery strategies, multi-region DB

Certificate Manager

Easily provision, manage, and deploy TLS Certificates
Used to provide in-flight encryption for websites (HTTPS)
Supports both public and private TLS certificates
Free of charge for public TLS certificates
Automatic TLS certificate renewal
load TLS certificates on
- Elastic Load Balancers (CLB, ALB, NLB)
- CloudFront Distributions
- APIs on API Gateway
- Cannot use ACM with EC2

Cloud Security

Web Application Firewall

Protects your application from common layer 7 web exploits such as SQL Injection and Cross-Site Scripting (XSS)
Layer 7 is HTTP (vs Layer 4 is TCP/UDP)
Can only be deployed on
- Application Load Balancer
- API Gateway
- CloudFront
- AppSync GraphQL API
- Cognito User Pool
WAF contains Web ACL(Access Control List) containing rules to filter requests based on:
- IP addresses
- HTTP headers
- HTTP body
- URI strings
- Size constraints (ex. max 5kb)
- Geo-match (block countries)
- Rate-based rules (to count occurrences of events per IP) for DDoS protection
Web ACL are Regional except for CloudFront

AWS Shield

DDoS: Distributed Denial of Service – many requests at the same time
AWS Shield Standard
- Free DDOS protection service that is activated for every AWS customer
- Provides protection from attacks such a
  - SYN/UDP Floods
  - Reflection attacks
  - and other layer 3/layer 4 attacks
AWS Shield Advanced
- DDoS mitigation service ($3,000 per month per organization)
- Protect against more sophisticated attack on
  - EC2 instances
  - Elastic Load Balancing (ELB)
  - CloudFront
  - Global Accelerator
  - Route 53
- 24/7 access to AWS DDoS Response (DRP) team

Firewall Manager

Manage all the firewall rules in all accounts of an AWS Organization
Security policy: common set of security rules
- WAF rules (Application Load Balancer, API Gateways, CloudFront)
- AWS Shield Advanced (ALB, CLB, NLB, Elastic IP, CloudFront)
- Security Groups for EC2, Application Load BAlancer and ENI resources in VPC
- AWS Network Firewall (VPC Level)
- Amazon Route 53 Resolver DNS Firewall
- Policies are created at the region level

Security Hub

Security Hub is a service provided by Amazon Web Services (AWS) that gives users a comprehensive view of their security posture across their AWS accounts.
It provides a centralized dashboard that aggregates and prioritizes security findings from various AWS services such as
- Amazon GuardDuty
- AWS Config
- AWS Inspector, and others

GuardDuty

Intelligent Threat discovery to protect your AWS Account
Uses Machine Learning algorithms, anomaly detection, 3rd party data
No management required (just enable)
Input data includes:
- CloudTrail Logs (unusual API calls, unauthorized deployments)
- VPC Flow Logs (unusual internal traffic, unusual IP address)
- DNS Logs (compromised EC2 instances sending encoded data within DNS queries)
- EKS Audit Logs (suspicious activities and potential EKS cluster compromises)
Can setup EventBridge rules to be notified in case of findings
EventBridge rules can target AWS Lambda or SNS
Can protect against CryptoCurrency attacks (has a dedicated “finding” for it)

Inspector

Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS
Automated Security Assessments
For :
- EC2 instances using System Manager (SSM) Agent running on EC2 instances
- Amazon ECR - Assessment of containers as they are pushed to ECR
- Lambda Functions Identifies software vulnerabilities in function code and package dependencies
- 2 Types of assessment
  - Network Assessments
    - Network configuration analysis to checks for ports reachable from outsive the VPC
    - Inspector agent is not required
  - Host Assessments
    - Vulnerability software(CVE), host hardening(CIS benchmarks), and security best practices
    - Inspector agent is required
- Amazon Inspector is a vulnerability management service that continuously scans your AWS workloads for vulnerabilities. It is not an intrusion detection service.
Integration with AWS Security Hub
Send findings to Amazon Event Bridge
Gives a risk score associated with all vulnerabilities for prioritization

Macie

Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS(ex in an S3 bucket).
Macie helps identify and alert you to sensitive data, such as personally identifiable information (PII)
Notifies through an EventBridge event

Network Firewall

Protect your entire Amazon VPC
From Layer 3 to Layer 7 protection
Any direction, you can inspect
- VPC to VPC traffic
- Outbound to internet
- Inbound from internet
- To / from Direct Connect & Site-to-Site VPN
Internally, the AWS Network Firewall uses the AWS Gateway Load Balancer
Rules can be centrally managed cross- account by AWS Firewall Manager to apply to many VPCs

HPC

High Performance Computing

Cloud is perfect for HPC
Cluster placement group for low latency inter-nodal communication
EC2 Enhanced Networking (SR-IOV)
- Elastic Network Adapter (ENA)
  - Supported in both Linux & Windows
- Elastic Fabric Adapter (EFA)
  - Enhanced for HPC
  - Supported in Linux only
  - Leverages Message Passing Interface (MPI) standard
  - Bypasses the underlying Linux OS to provide low-latency networking
Automation and Orchestration
- AWS Batch
  - Used to run single jobs that span multiple EC2 instances (multi-node)
- AWS Parallel Cluster
  - Open-source cluster management tool to deploy HPC on AWS
  - Configure with text files
  - Automate creation of VPC, Subnet, cluster type and instance types
  - Ability to enable EFA on the cluster

aws-solutions-architect-associate-notes

Contents

Compute services

High availability and scalability

Storage

Database

Decoupling applications

Data & Analytics

Migration & Transfer

Disaster Recovery

Machine Learning

Networking

Content delivery

Monitoring & Audit

Access Management

Parameters & Encryption

Cloud Security

HPC

Docs

Compute services

EC2

Lambda

Elastic Beanstalk

Elastic Container Service

Elastic Container Registry

Elastic Kubernates Service

High availability and scalability

Elastic Load Balancer

Auto Scaling Group

Launch Configuration & Launch Template

Storage

S3

EBS

EFS

Instance Store

FSX

Storage Gateway

Aws Backup

Database

RDS

RDS Custom

Amazon Aurora

ElastiCache

DynamoDB

DocumentDB

Amazon Neptune

Amazon QLDB

Amazon Timestream

Decoupling applications

SQS

Queue Types

Standard Queue

FIFO Queue

Consumer Auto Scaling

Security

Configurations

SNS

Kinesis

Amazon MQ

Event Bridge

Data & Analytics

Athena

Redshift

OpenSearch

EMR

QuickSight

Glue

Lake Formation

Kinesis Data analytics

MSK Managed Streaming for Apache Kafka

Big Data Ingestion Pipeline

Migration & Transfer

Snow Family

DataSync

Transfer Family

Database Migration Service

Application migration service

RDS and Aurora MySQL Migrations

Disaster Recovery

RPO and RTO