mnberthe / aws-solutions-architect-associate-notes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

aws-solutions-architect-associate-notes

Contents

Compute services

High availability and scalability

Storage

Database

Decoupling applications

Data & Analytics

Migration & Transfer

Disaster Recovery

Machine Learning

Networking

Content delivery

Monitoring & Audit

Access Management

Parameters & Encryption

Cloud Security

HPC

Docs

AWS Well Architected

AWS reference architectures

AWS architecture solution

AWS Disaster Recovery

other ressources Tips

Compute services

EC2

  • EC2 (Elastic Compute Cloud) is an Infrastructure as a Service (IaaS)
  • Storage space:
    • Network attached (EBS & EFS)
    • Direct attached(Hardware) (EC2 Instance Store)
  • Firewall rules: security group
  • Static IPv4 addresses known as Elastic IP addresses
  • Bootstrap script (Execute only at first launch): EC2 User Data
  • MetaData is data about your EC2 instance
  • This can include information such as private IP address, public IP address, hostname, security groups, etc.
  • Url to fetch metadata about the instance (http://169.254.169.254/latest/meta-data)

IAM Roles for EC2 instances

  • Never enter AWS credentials into the EC2 instance, instead attach IAM Roles to the instances

EC2 Instance Types

You can use different types of EC2 instances that are optimised for different use cases\

General Purpose

  • Great for a diversity of workloads such as web servers or code repositories
  • Balance between Compute, Memory, Networking

Compute Optimize

  • Great for compute-intensive tasks that require high performance processors:
    • Batch processing workloads
    • Media transcoding
    • High performance web servers
    • High performance computing (HPC)
    • Scientific modeling & machine learning
    • Dedicated gaming servers

Memory Optimized

  • Fast performance for workloads that process large data sets in memory
  • Distributed web scale cache stores

Storage Optimized

  • Great for storage-intensive tasks that require high, sequential read and write access to large data sets on local storage
  • High frequency online transaction processing (OLTP) systems
  • Cache for in-memory databases (for example, Redis)
  • Data warehousing applications

Security group

  • Security groups are acting as a “firewall” on EC2 instances
  • They regulate:
    • Access to Ports
    • Authorised IP ranges – IPv4 and IPv6
    • Control of inbound network (from other to the instance)
    • Control of outbound network (from the instance to other)
  • All inbound traffic is blocked by default
  • All outbound traffic is allowed.
  • It’s good to maintain one separate security group for SSH access
  • If your application is not accessible (time out), then it’s a security group issue
  • If your application gives a “connection refused“ error, then it’s an application error or it’s not launched
  • You can't delete the default security group, however, you can change the default SG rules
  • You can assign up to five security groups to the instance.

EC2 Instances Purchasing Options

On Demand

  • Pay per use (no upfront payment)
  • Has the highest cost but no upfront payment
  • No long-term commitment
  • Recommended for short-term and un-interrupted workloads, where you can't predict how the application will behave

Reserved Instances

  • Predictable Usage, Applications with steady state or predictable usage
  • Specific Capacity Requirements, Applications that require reserved capacity.
  • Pay up Front You can make upfront payments to reduce the total computing costs even further
  • Standard RIs Up to 72% off the on-demand price.
  • Convertible RIs Up to 54% off the on-demand price. Can change the instance type
  • Scheduled RIs Launch within the time window you define. Match your capacity reservation to a predictable recurring schedule that
    only requires a fraction of a day, week, or month.

EC2 Savings Plans

  • Get a discount based on long-term usage (up to 72% - same as RIs)
  • Commit to a certain type of usage ($10/hour for 1 or 3 years)
  • Usage beyond EC2 Savings Plans is billed at the On-Demand price
  • Flexible across: Instance size.
  • Commit to a certain type of usage ($10/hour for 1 or 3 years, OS, Tenancy(Host, Dedicated, Default)

EC2 Spot Instances

  • Can get a discount of up to 90% compared to On-demand
  • Instances that you can “lose” at any point of time if your max price is less than the current spot price
  • The MOST cost-efficient instances in AWS
  • Useful for workloads that are resilient to failure
  • Not suitable for critical jobs or databases
  • Spot Block“block” spot instance during a specified time frame (1 to 6 hours) without interruptions
  • You can only cancel Spot Instance requests that are open, active, or disabled.
  • To terminate spot instance You must first cancel a Spot Request, and then terminate the associated Spot Instances.

EC2 Dedicated Hosts

  • A physical server with EC2 instance capacity fully dedicated to your use
  • Allows you address compliance requirements and use your existing server- bound software licenses
  • The most expensive option
  • Useful for software that have complicated licensing mode
  • Or for companies that have strong regulatory or compliance needs

EC2 Capacity Reservations

  • Reserve On-Demand instances capacity in a specific AZ for any duration
  • You always have access to EC2 capacity when you need it
  • You’re charged at On-Demand rate whether you run instances or not

Elastic IP

  • If you need to have a fixed public IP for your instance, you need an Elastic IP
  • An Elastic IP is a public IPv4 IP you own as long as you don’t delete it
  • You can attach it to one instance at a time
  • With an Elastic IP address, you can mask the failure of an instance or software by rapidly remapping the address to another instance in your account.
  • You can only have 5 Elastic IP in your account

Placement Groups 3 types of placements groups : Cluster, Spread, Partition

Cluster

  • Grouping of instances within a single Availability Zone, same hardware
  • Recommended for applications that need low network latency, high network throughput, or both.

Spread

  • A spread placement group is a group of instances that are each placed on distinct underlying hardware.
  • Recommended for applications that have a small number of critical instances that should be kept separate from each other
  • Multi AZ, same region
  • max 7 instances per group per AZ
  • Reduce risk of simulataneous failure

Partition

  • Each partition placement group has its own set of racks. Each rack has its own network and power source.
  • Multiple AZs in the same region
  • Up to 7 partitions per AZ
  • Up to 100s of EC2 instances
  • Isolate from failure

Networking with EC2

You can attach 3 different types of virtual networking cards to your EC2.

ENI(Elastic Network Interface)

  • An ENI is simply a virtual network card that allows:
    • Private IPv4 addresses
    • Public Ipv4 Address
    • Many IPV6
    • One Elastic IP (IPv4) per private IPv4
    • Mac Address
    • 1 or More SG
  • You can create ENI independently and attach them on the fly (move them) on EC2 instances for failover
  • Bound to a specific availability zone (AZ)

EN(Enhanced Networking) _ For High Performance Networking between 10 Gbps to 100 Gbps

  • Single Root I/O Virtualization (SR-IOV) provides higher I/O performance and lower CPU utilization
  • Depending on your instance type, enhanced networking(EN) can be enable using :

       1. ENA (Elastic Network Adapter) Supports network speeds of up to 100 Gbps for supported instances types
       2. VF (Virtual Function) interface Supports network speeds of up to 10 Gbps for supported instance types Typically used on older instances

EFA(Elastic fabric Adapter)

  • For when you need to accelerate High Performance Computing (HPC) and machine learning applications
  • Or if you need to do an OS-bypass
  • OS-bypass enables HPC and machine learning applications to bypass the operating system kernel and communicate directly with the EFA device
  • Not currently supported with Windows — only Linux.

Hibernation

  • The in-memory (RAM) state is preserved
  • The instance boot is much faster! (the OS is not stopped / restarted)
  • Under the hood: the RAM state is written to a file in the root EBS volume
  • The root EBS volume must be encrypted
  • Instance RAM Size – must be less than 150 GB.

Lambda

  • Virtual functions – no servers to manage!
  • Limited by time - short executions
  • Run on-demand
  • Scaling is automated
  • Not good for running containerized applications

Lambda Limits

  • Execution
    • Memory allocation: 128 MB – 10GB
    • Maximum execution time: 900 seconds (15 minutes)
    • Environment variables: 4KB
    • Disk capacity in function container (/tmp): 512 MB to 10GB
    • Concurrency executions: 1000 (can be increased)
  • Deployment
    • Lambda function deployment size (compressed .zip): 50 MB
    • Size of uncompressed deployment (code + dependencies): 250 MB
    • Size of environment variables: 4 KB

Lambda@Edge

  • Deploy Lambda functions alongside your CloudFront CDN for computing at edge locations
  • Customize the CDN content using Lambda at the edge location (responsive)
  • No server management (Lambda is deployed globally)
  • Can be used to modify CloudFront requests & responses

image

Networking

  • By default, your Lambda function is launched outside your own VPC (in an AWS owned VPC)
  • Therefore, it cannot access resources in your VPC (RDS, ElastiCache, internal ELB…)
  • To enable your Lambda function to access resources inside your private VPC,
  • You must define the VPC ID, the Subnets and the Security Groups
  • Lambda will create an ENI (Elastic Network Interface) in your subnets

Capture d’écran 2023-03-21 à 15 37 02

Elastic Beanstalk

  • Used to deploy applications on AWS infrastructure

  • Platform as a Service (PaaS)

  • Automatically handles capacity provisioning, load balancing, scaling, application health monitoring, instance configuration, etc. but we have full control over the configuration

  • Free (pay for the underlying resources)

  • Supports versioning of application code

  • Can create multiple environment (dev, test, prod)

  • Web & Worker Environments

    • Web Environment (Web Server Tier): clients requests are directly handled by EC2 instances through a load balancer.
    • Worker Environment (Worker Tier): clients’s requests are put in a SQS queue and the EC2 instances will pull the messages to process them. Scaling depends on the number of SQS messages in the queue.

    image

Elastic Container Service

  • AWS managed container orchestration platform
  • Launch Docker containers on AWS = Launch ECS Tasks on ECS Clusters
  • EFS is used as persistent multi-AZ shared storage for ECS tasks

ECS Componenets

  • Clusters
    • An Amazon ECS cluster is a logical grouping of Tasks or services.
    • You can use clusters to isolate your applications, This way, they don't use the same underlying infrastructure
    • When your tasks are run on Fargate, your cluster resources are also managed by Fargate
  • Task definitions
    • A task definition is a text file that describes one or more containers that form your application
    • It's in JSON format
    • You can use it to describe up to a maximum of ten containers
    • The task definition functions as a blueprint for your application
    • AWS recommend spanning your application across multiple task definitions
    • parameters
      • Docker image
      • CPU and memory
      • The command that the container runs when it's started
      • Data volumes that are used with the containers in the task
      • The IAM role that your tasks use
  • Services
    • You can use an Amazon ECS service to run and maintain your desired number of tasks simultaneously in an Amazon ECS cluster
    • If any of your tasks fail or stop for any reason, the Amazon ECS service scheduler launches another instance based on your task definition.
    • Parameters
      • Cluster
      • Task definition
      • Capacity provider
      • Client token

Launch Types

EC2 Launch Type

  • Not Serverless
  • you must provision & maintain the infrastructure (the EC2 instances)
  • EC2 instances have ECS agent to register in the ECS Cluster
  • AWS takes care of starting / stopping containers
  • Use case: Long running process, cost optimisation(possible to reserve EC2 or Spot)

Fargate Launch Type

  • Serverless
  • You do not provision the infrastructure (no EC2 instances to manage)
  • You just create task definitions
  • AWS just runs ECS Tasks for you based on the CPU / RAM you need
  • To scale, just increase the number of tasks
  • Use case: When you want to run container for a little bit of time

IAM Roles for ECS

  • EC2 Instance Profile (EC2 Launch Type only):
    • Used by the ECS agent
    • Makes API calls to ECS service
    • Pull Docker image from ECR
    • Reference sensitive data in Secrets Manager or SSM Parameter Store
  • ECS Task Role(Both EC2 launch type and Fargate):
    • Allows ECS tasks to access AWS resources
    • Each task can have a separate role
    • Use different roles for the different ECS Services
    • Task Role is defined in the task definition

Data Volumes (EFS)

  • Mount EFS file systems onto ECS tasks
  • Works for both EC2 and Fargate launch types
  • Tasks running in any AZ will share the same data in the EFS file system
  • Fargate + EFS = Serverless

ECS Service Auto Scaling

  • Automatically increase/decrease the desired number of ECS tasks

  • Amazon ECS Auto Scaling uses AWS Application Auto Scaling

    • Metric :
      • ECS Service Average CPU Utilization
      • ECS Service Average Memory Utilization - Scale on RAM
      • ALB Request Count Per Target – metric coming from the ALB
  • Scaling type:

    • Target Tracking – scale based on target value for a specific CloudWatch metric
    • Step Scaling – scale based on a specified CloudWatch Alarm
    • Scheduled Scaling – scale based on a specified date/time (predictable changes)
  • ECS Service Auto Scaling (task level) ≠ EC2 Auto Scaling (EC2 instance level)

  • Fargate Auto Scaling is much easier to setup (because Serverless)

EC2 Launch Type – Auto Scaling EC2 Instances

  • Accommodate ECS Service Scaling by adding underlying EC2 Instances
  • 2 type:
    • Auto Scaling Group Scaling
    • Scale your ASG based on CPU Utilization
    • Add EC2 instances over time
    • ECS Cluster Capacity Provider(new and more advance)
      • Used to automatically provision and scale the infrastructure for your ECS Tasks
      • Capacity Provider paired with an Auto Scaling Group
      • Add EC2 Instances when you’re missing capacity (CPU, RAM…)

Capture d’écran 2023-03-21 à 10 37 51

Elastic Container Registry

  • Store and manage Docker images on AWS
  • Private and Public repository (Amazon ECR Public Gallery)
  • Fully integrated with ECS, backed by Amazon S3
  • Access is controlled through IAM policy
  • Lifecycle Rule to expire and remove unsed or older images
  • Caching public repos privately(ECR periodically reaches out to check current caching status)
  • Tag Mutability Prevent image tags from being overwritten

Elastic Kubernates Service

  • Used to launch Kubernetes (open-source) clusters on AWS
  • Supports both EC2 and Fargate launch types
  • Inside the EKS cluster, we have EKS nodes (EC2 instances) and EKS pods (tasks) within them. We can use a private or public load balancer to access these EKS pods.
  • EKS is an alternative to ECS
  • Node Types
    • Managed Node Groups
      • Creates and manages Nodes (EC2 instances) for you
      • Nodes are part of an ASG managed by EKS
    • Self-Managed Nodes
      • Nodes created by you and registered to the EKS cluster and managed by an ASG
      • You can use prebuilt AMI - Amazon EKS Optimized AMI
    • AWS Fargate
      • No maintenance required; no nodes managed

image

EKS Anywhere

  • on-premises way to manage Kubernetes (K8s) clusters with the same practices used for Amazon EKS
  • The key difference is you run thes clusters on premises
  • Based on EKS Distro
  • Offers Operates of AWS full lifecycle management of multiple K8s clusters
  • Operates independently of AWS
  • Control Plane K8s control plane management is operated completly by the custumer
  • Location K8s control plane location entirely within a is within customer center or operations center

ECS Anywhere

  • Feature of Amazon ECS allowing the management of container- based apps on-premises
  • No need to install and operate local container orchestration software, meaning more operational efficiency

High availability and scalability

  • Vertical Scaling: Increase instance size (= scale up / down)

    • From: t2.nano - 0.5G of RAM, 1 vCPU
    • To: u-12tb1.metal – 12.3 TB of RAM, 448 vCPUs
    • Hardware limit
    • Use case : Non distribuate system like database
  • Horizontal Scaling: Increase number of instances (= scale out / in)

    • Auto Scaling Group
    • Load Balancer
  • High Availability

    • Run instances for the same application across multi AZ
    • Auto Scaling Group multi AZ
    • Load Balancer multi AZ

Elastic Load Balancer

  • Spread load across multiple EC2 instances
  • Supports Multi AZ
  • Expose a single point of access (DNS) to your application
  • Do regular health checks to your instances
  • Enforce stickiness with cookies
  • High availability across zones
  • Separate public traffic from private traffic

Types

  • Classic Load Balancer (CLB) - deprecated

    • Load Balancing to a single application
    • Supports HTTP, HTTPS (layer 7) & TCP (layer 4), SSL
    • Health checks are HTTP or TCP based
    • Provides a fixed hostname (xxx.region.elb.amazonaws.com)
  • Application Load Balancer (ALB)

    • Load balancing to multiple applications (target groups) based on the request parameters
    • Operates at Layer 7 (HTTP, HTTPS and WebSocket)
    • Provides a fixed hostname (xxx.region.elb.amazonaws.com)
    • Security Groups can be attached to ALBs to filters requests
    • Great for micro services & container-based applications (Docker & ECS)
    • Client info is passed in the request headers
      • Client IP => X-Forwarded-For
      • Client Port => X-Forwarded-Port
      • Protocol => X-Forwarded-Proto
    • Target Groups
      • Health checks are done at the target group level
      • Target Groups could be
        • EC2 instances - HTTP
        • ECS tasks - HTTP
        • Lambda functions - HTTP request is translated into a JSON event
        • Private IP Addresses
    • Listener Rules can be configured to route traffic to different target groups based on
      • Path (example.com/users & example.com/posts)
      • Hostname (one.example.com & other.example.com)
      • Query String (example.com/users?id=123&order=false)
      • Request Headers
      • Source IP address
  • Network Load Balancer (NLB)

    • Operates at Layer 4 (TCP, UDP, TLS over TCP)
    • Can handle millions of request per seconds (extreme performance)
    • Lower latency ~ 100 ms (vs 400 ms for ALB
    • 1 static public IP per AZ
    • Health Checks support the TCP, HTTP and HTTPS Protocols
    • No security groups can be attached to NLBs. Since they operate on layer 4, they cannot see the data available at layer 7. They just forward the incoming traffic to the right target group as if those requests were directly coming from client. So, the attached instances must allow TCP traffic on port 80 from anywhere.
    • Within a target group, NLB can send traffic to
      • EC2 instances
      • IP addresses( must be private IPs)
      • Application Load Balancer (ALB)
  • Gateway Load Balancer (GWLB)

    • Operates at layer 3 (Network layer) - IP packets
    • Used when you want to inspect, analyze the traffic at network level before coming to your ELB or EC2 etc
    • Used to route requests to a fleet of 3rd party virtual appliances like Firewalls, Intrusion Detection and Prevention Systems (IDPS), etc.
    • Then after inspection by the 3rd Party route back the traffic to your instances or ELB
    • target :
      • EC2 instances
      • IP adresses(must be private)
  • Sticky Sessions (Session Affinity)

    • Requests coming from a client is always redirected to the same instance based on a cookie After the cookie expires, the requests coming from the same user might be redirected to another instance

    • Only supported by CLB & ALB because the cookie can be seen at layer 7

    • Used to ensure the user doesn’t lose his session data, like login or cart info, while navigating between web pages.

    • Stickiness may cause load imbalance

    • Cookies could be:

      • Application-based (TTL defined by the application)
      • Load Balancer generated (TTL defined by the load balancer)
    • ELB reserved cookie names (should not be used):

      • AWSALB
      • AWSALBAPP
      • AWSALBTG
  • Cross-zone Load Balancing

  • Allows ELBs in different AZ containing unbalanced number of instances to distribute the traffic evenly across all instances in all the AZ registered under a load balancer.

  • Supported Load Balancers

    • Classic Load Balancer : Disabled by default
    • Application Load Balancer : Always on (can be disabled at the target group level)
    • Network Load Balancer : Disabled by default
  • Security

  • The load balancer uses an X.509 certificate (SSL/TLS server certificate)

  • You can manage certificates using ACM (AWS Certificate Manager)

  • You can create upload your own certificates alternatively

  • Server Name Indication (SNI)

    • SNI solves the problem of loading multiple SSL certificates onto one web server (to serve multiple websites)
    • It’s a “newer” protocol, and requires the client to indicate the hostname of the target server in the initial SSL handshake
    • The server will then find the correct certificate, or return the default one
    • Does not work for CLB work with ALB and NLB
  • Connection Draining

  • Connection Draining – for CLB

  • Deregistration Delay – for ALB & NLB

  • Time to complete in-flight requests while the instance is de-registering or unhealthy

  • Stops sending new requests to the EC2 instance which is de-registering

  • Between 1 to 3600 seconds (default: 300 seconds)

Auto Scaling Group

The goal of an Auto Scaling Group (ASG) is to:

  • Scale out (add EC2 instances) to match an increased load
  • Scale in (remove EC2 instances) to match a decreased load
  • Ensure we have a minimum and a maximum number of EC2 instances running
  • Automatically register new instances to a load balancer
  • Re-create an EC2 instance in case a previous one is terminated (ex: if unhealthy)
  • ASG can terminate instances marked as unhealthy by an ELB

Scaling Policies

  • Scheduled Scaling

    • Scale based on a schedule
    • Used when the load pattern is predictable
    • Anticipate a scaling based on known usage patterns
  • Simple Scaling/Step Scaling

    • Scale to certain size on a CloudWatch alarm (ex average CPU utilization in all ASG instances)
    • When a CloudWatch alarm is triggered (example CPU > 70%), then add 2 units
    • When a CloudWatch alarm is triggered (example CPU < 30%), then remove 1
  • Target Tracking Scaling

    • ASG maintains a CloudWatch metric and scale accordingly to maintain the target defined
    • Ex. maintain CPU usage at 40%
  • Predictive Scaling

    • Historical data is used to predict the load pattern using ML and scale automatically

Launch Configuration & Launch Template

  • Defines the following info for ASG
    • AMI (Instance Type)
    • EC2 User Data
    • EBS Volumes
    • Security Groups
    • SSH Key Pair
    • Min / Max / Desired Capacity
    • Subnets (where the instances will be created)
    • Load Balancer (specify which ELB to attach instances)
    • Scaling Policy
  • Launch Configuration (legacy)
    • Cannot be updated (must be re-created)
    • Does not support Spot Instances
  • Launch Template (newer)
    • Versioned
    • Can be updated
    • Supports both On-Demand and Spot Instances
    • Recommended by AWS

Cooldown

  • After a scaling activity happens, the ASG goes into cooldown period (default 300 seconds) during which it does not launch or terminate additional instances (ignores scaling requests) to allow the metrics to stabilize.
  • Use a ready-to-use AMI to launch instances faster to be able to reduce the cooldown period

Warm-Up

  • Warm-up value for Instances allows you to control the time until a newly launched instance can contribute to the CloudWatch metrics, so when warm-up time has expired, an instance is considered to be a part Auto Scaling group and will receive traffic

Relational Database scaling (RDS)

There are 4 types of scaling we can use to adjust our relational database performance:

  • Aurora Serverless
    • We can offload scaling to AWS. Excels with unpredictable workloads.
  • Read Replicas
    • Creating read-only copies of our data can help spread out the workload
  • Scaling Storage
    • Storage can be resized(disk size), but it’s only able to go up, not down(except for Aurora).
  • Vertical Scaling
    • Resizing the database from one size(ex EC2 t2micro) to another(ex EC2 t3 large) can create greater performance.

No Relational Database scaling

  • DynamoDB DynamoDB scaling is used to scale dynamoDB

Storage

image

S3

  • Remember that S3 is Object-based: i.e allows you to upload files.
  • Files can be 0 Bytes to 5 TB.
  • There is unlimited storage
  • Files are stored in Buckets. Bucket is tied to region, while S3 is global level
  • Not suitable to install operating systems on S3 due to it being object based
  • You can turn on MFA delete to avoid accidental delete.
  • Tiered Storage S3 offers a range of storage classes designed for different use cases.
  • Lifecycle Management Define rules to automatically transition objects to a cheaper storage tier or delete objects that are no longer required after a set period of time.

Versioning

  • You can version your files in Amazon S3
  • Protect against unintended deletes (ability to restore a version)
  • Easy roll back to previous version
  • With versioning, all versions of an object are stored and can be retrieved, including deleted objects.
  • When you DELETE an object, all versions remain in the bucket and Amazon S3 inserts a delete marker.
  • You can permanently delete an object by specifying the version you want to delete. Only the owner of an Amazon S3 bucket can permanently delete a version.
  • Versioning can only be suspended once it has been enabled.

Amazon S3 – Replication

  • Must enable Versioning in source and destination buckets
  • Cross-Region Replication (CRR)
  • Same-Region Replication (SRR)
  • Buckets can be in different AWS accounts
  • After you enable Replication, only new objects are replicated
  • You can replicate existing objects using S3 Batch Replication
  • For DELETE operations:
    • Replicate delete markers from source to target (optional)
    • Permanent deletes are not replicated

Security

  • User based security

    • IAM policies define which API calls should be allowed for a specific user
    • Preferred over bucket policy for fine-grained access control
  • Resource Based Policies

    • Bucket Policies
      • Grant public access to the bucket
      • Can either add or deny permissions across all (or a subset) of objects within a bucket.
      • Force objects to be encrypted at upload
      • You use a bucket policy to control access to objects in the bucket that are owned by the account used to create the bucket. Y
      • Cross-account access
    • Access Control Lists
      • A list of grants identifying grantee and permission granted
      • ACLs use an S3–specific XML schema.
      • You can grant permissions only to other AWS accounts, not to users in your account
      • You need to use an ACL to control access to objects in your bucket but owned by other account
      • You cannot grant conditional permissions, nor explicitly deny permissions.
      • Object ACLs are limited to 100 granted permissions per ACL
      • The only recommended use case for the bucket ACL is to grant write permissions to the S3 Log Delivery group
  • Note: An IAM principal can access an S3 object if the IAM permission allows it or the bucket policy allows it and there is no explicit deny.

  • Important

    • When you configure a bucket as a static website, if you want your website to be public, you can grant public read access. To make your bucket publicly readable, you must disable block public access settings for the bucket and write a bucket policy that grants public read access. If your bucket contains objects that are not owned by the bucket owner, you might also need to add an object access control list (ACL) that grants everyone read access.
    • If you don't want to disable block public access settings for your bucket but you still want your website to be public, you can create a Amazon CloudFront distribution to serve your static website
    • You can use a bucket policy to grant public read permission to your objects. However, the bucket policy applies only to objects that are owned by the bucket owner. If your bucket contains objects that aren't owned by the bucket owner, the bucket owner should use the object access control list (ACL) to grant public READ permission on those objects.

Accessing a bucket

You can encrypt objects in S3 buckets using one of 4 methods

  • Encryption in Transit
    • SSL/TLS
    • HTTPS
    • HTTPS is mandatory for SSE-C
  • Server-Side Encryption (SSE)
    • Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) – Enabled by Default

      • Encrypts S3 objects using keys handled, managed, and owned by AWS
      • Object is encrypted server-side using AES-256
      • Must set header: "x-amz-server-side-encryption": "AES256"
    • Server-Side Encryption with KMS Keys stored in AWS KMS (SSE-KMS)

      • Leverage AWS Key Management Service (AWS KMS) to manage encryption keys
      • If you use SSE-KMS, you may be impacted by the KMS limits quotas
      • Must set header: "x-amz-server-side-encryption": "aws:kms"
    • Server-Side Encryption with Customer-Provided Keys (SSE-C)

      • When you want to manage your own encryption keys
      • Amazon S3 does NOT store the encryption key you provide
      • HTTPS must be used
      • Encryption key must provided in HTTP headers, for every HTTP request made
    • Client-Side Encryption

      • Use client libraries such as Amazon S3 Client-Side Encryption Library
      • Clients must encrypt data themselves before sending to Amazon S3
      • Clients must decrypt data themselves when retrieving from Amazon S3

Enforcing Encryption with a Bucket Policy

  • A bucket policy can deny all PUT requests that don’t include the x-amz-server-side encryption parameter in the request header

CORS

  • If a client makes a cross-origin request on our S3 bucket, we need to enable the correct CORS headers
  • You can allow for a specific origin or for *

MFA Delete

  • MFA will be required to:
    • Permanently delete an object version
    • Suspend Versioning on the bucket
  • Bucket Versioning must be enabled
  • Can only be enabled or disabled by the root user

S3 Access Logs

  • For audit purpose, you may want to log all access to S3 buckets
  • Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
  • That data can be analyzed using data analysis tools
  • The target logging bucket must be in the same AWS region

Pre-signed URL

  • Pre-signed URLs for S3 have temporary access token as query string parameters which allow anyone with the URL to temporarily access the resource before the URL expires (default 1h)
  • Pre-signed URLs inherit the permission of the user who generated it
  • Uses:
    • Allow only logged-in users to download a premium video
    • Allow users to upload files to a precise location in the bucket

S3 Object Lock and Glacier Vault Lock

  • Use S3 Object Lock to store objects using a write once, read many (WORM)model.
  • Object Lock comes in two modes: governance mode and compliance mode.
    • Governance mode: users can’t overwrite or delete an object version or alter
      its lock settings unless they have special permissions
    • Compliance mode: a protected object version can’t be overwritten or deleted
      by any user, including the root user in your AWS account

S3 Storage Classes

S3 Standard – General Purpose

  • Used for frequently accessed data
  • Low latency and high throughput
  • The default storage class
  • Minimum storage N/A but for Transition from S3 Standard or S3 Standard-IA to S3 Standard-IA or S3 One Zone-IA there is a minimum 30 days duration
  • This limitation does not apply to INTELLIGENT_TIERING, GLACIER, and DEEP_ARCHIVE storage class.
  • Use cases include websites, content distribution, mobile and gaming applications, and big data analytics

S3 Standard-Infrequent Access (S3 Standard-IA)

  • Infrequently accessed data(once a month)
  • Rapid Access: Used for data that is accessed less frequently but requires rapid access when needed.
  • Minimum storage duration of 30 days
  • There is a low per-GB storage price and a per-GB retrieval fee
  • Use cases: Disaster Recovery, backups

S3 Intelligent-Tiering

  • Data with changing or unknown access patterns
  • Automatically moves your data to the most cost-effective tier based on how frequently you access each object.
  • Minimum storage duration of 30 days

S3 Glacier

  • Low-cost object storage meant for archiving / backup

  • Pricing: price for storage + object retrieval cost

  • Amazon S3 Glacier Instant Retrieval

    • Millisecond retrieval, great for data accessed once a quarter
    • Minimum storage duration of 90 days
  • Amazon S3 Glacier Flexible Retrieval

    • Data accessed once a year
    • 3 retrieval flexibility:
      • Expedited (1 to 5 minutes)
      • Standard (3 to 5 hours)
      • Bulk (5 to 12 hours)
    • Minimum storage duration of 90 days
  • Amazon S3 Glacier Deep Archive – for long term storage

    • Data accessed once a year
    • 2 flexible retrieval:
      • Standard (12 hours)
      • Bulk (48 hours)
    • Minimum storage duration of 180 days

S3 Lifecycle Management

  • Lifecycle management automates moving your objects between thedifferent storage tiers, thereby maximizing cost effectiveness.

S3 Notification Events

  • Optional
  • Generates events for operations performed on the bucket or objects
  • Targets:
    • SNS topics
    • SQS Standard queues (not FIFO queues)
    • Lambda functions

S3 performance

3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second, per prefix General Perf

  • You can get better performance by spreading your reads across different prefixes, you can achieve 11,000 requests per second with 2 prefixes.
  • If we used all 4 prefixes in the last example, you would achieve 22,000 requests per second
  • There are no limits to the number of prefixes in a bucket.

Multipart Uploads Upload Perf

  • Recommended for files over 100 MB
  • Required for files over 5 GB
  • Parallelize uploads (increases efficiency)

S3 Transfer Acceleration

  • Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
  • Compatible with multi-part upload
  • Data is ingested at the nearest edge location and is transferred over AWS private network (uses CloudFront internally)

S3 Byte-Range Fetches Downloads Perf

  • Parallelize downloads by specifying byte ranges.
  • Better resilience in case of failures, since we only need to refetch the failed byte range and not the whole file

S3 Select & Glacier Select

  • Retrieve less data using SQL by performing server-side filtering
  • Can filter by rows & columns (simple SQL statements)
  • Less network transfer cost
  • Less CPU cost client-side

EBS

  • An EBS (Elastic Block Store) Volume is a network drive you can attach to your instances while they run
  • Can only be mounted to 1 instance at a time (except EBS multi-attach)
  • It allows your instances to persist data, even after their termination
  • They are bound to a specific availability zone
  • An EBS Volume in us-east-1a cannot be attached to us-east-1b
  • To move a volume across, you first need to snapshot
  • EBS Multi-attach allows the same EBS volume to attach to multiple EC2 instances in the same AZ

EBS Snapshots

  • Make a backup (snapshot) of your EBS volume at a point in time
  • Not necessary to detach volume to do snapshot, but recommended
  • Can copy snapshots across AZ or Region

EBS Snapshot Archive

  • Move a Snapshot to an ”archive tier” that is 75% cheaper
  • Takes within 24 to 72 hours for restoring the archive

Recycle Bin for EBS Snapshots

  • Setup rules to retain deleted snapshots so you can recover them after an accidental deletion
  • Specify retention (from 1 day to 1 year)

Fast Snapshot Restore (FSR)

  • Force full initialization of snapshot to have no latency on the first use ($$$)

Volume Types Only gp2/gp3 and io1/io2 can be used as boot volumes

EBS SSD

  • gp2/gp3 (SSD) General purpose SSD volume that balances price and performance for a wide variety of workloads
  • io1/io2 (SSD) Provisioned IOPS Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads
    Attach the same EBS volume to multiple EC2 instances in the same AZ

EBS HDD

  • st1 (HDD) Low cost HDD volume designed for frequently accessed, throughput- intensive workloads
  • sc1 (HDD) Lowest cost HDD volume designed for less frequently accessed workloads

Tips

  • Amazon EBS provides three volume types to best meet the needs of your workloads: General Purpose (SSD), Provisioned IOPS (SSD), and Magnetic.
  • General Purpose (SSD) is the new, SSD-backed, general purpose EBS volume type that is recommended as the default choice for customers. General Purpose (SSD) volumes are suitable for a broad range of workloads, including small to medium-sized databases, development and test environments, and boot volumes.
  • Provisioned IOPS (SSD) volumes offer storage with consistent and low-latency performance and are designed for I/O intensive applications such as large relational or NoSQL databases. Magnetic volumes provide the lowest cost per gigabyte of all EBS volume types.
  • Magnetic volumes are ideal for workloads where data are accessed infrequently, and applications where the lowest storage cost is important. Take note that this is a Previous Generation Volume. The latest low-cost magnetic storage types are Cold HDD (sc1) and Throughput Optimized HDD (st1) volumes.

EBS Encryption

  • For Encrypted EBS volumes:
    • Data at rest is encrypted
    • EBS Encryption leverages keys from KMS (AES-256)
    • Data in-flight between the instance and the volume is encrypted
    • All snapshots are encrypted
    • All volumes created from the snapshot are encrypted
  • Encrypt an un-encrypted EBS volume
    • Create an EBS snapshot of the volume
    • Copy the EBS snapshot and encrypt the new copy
    • Create a new EBS volume from the encrypted snapshot (the volume will be automatically encrypted)

EFS

  • Managed NFS (network file system) that can be mounted on many EC2
  • EFS works with EC2 instances in multi-AZ
  • Highly available, scalable, expensive
  • File system scales automatically, pay-per-use, no capacity planning
  • Compatible with Linux-based AMI (Windows not supported at this time)
  • Encryption at rest using KMS

Performance Mode

  • File system performance is typically measured by using the dimensions of latency, throughput, and Input/Output operations per second (IOPS)
  • When creating an EFS file system, you can set what performance characteristics you want
    • General Purpose (default):
      • Has the lowest per-operation latency
      • Use cases (web server, CMS, etc.)
    • Max I/O :
      • Max I/O mode is designed for highly parallelized workloads that can tolerate higher latencies than the General Purpose mode
      • higher latency & throughput (big data, media processing)

Throughput Mode

  • Bursting (default)
    • Throughput: 50MB/s per TB
    • Burst of up to 100MB/s.
    • Bursting Throughput mode is recommended for workloads that require throughput that scales with the amount of storage in your file system.
  • Provisioned
    • Fixed throughput (provisioned)
    • In Provisioned Throughput mode, you specify a level of throughput that the file system can drive independent of the file system's size

Storage Tiers

  • EFS comes with storage tiers and lifecycle management, allowing you to move your data from one tier to another after X number of days.
    • Standard For frequently accessed files
    • Infrequently Accessed For files not frequently accessed

Encryption

  • EFS supports two forms of encryption for file systems
  • Encryption of data in transit
    • You can enable encryption of data in transit when you mount the file system using the Amazon EFS mount helper
    • Data is encrypted in transit without needing to modify your applications.
  • Encryption at rest
    • You can enable encryption of data at rest when creating an Amazon EFS file
    • You can create encrypted file systems using:
      • AWS Management Console
      • AWS CLI
      • SDK

Instance Store

  • EBS volumes are network drives with good but “limited” performance
  • If you need a high-performance hardware disk, use EC2 Instance Store
  • Better I/O performance
  • You can specify the instance store volumes for your instance only when you launch it. You can't attach instance store volumes to an instance after you've launched it.
  • EC2 Instance Store lose their storage if they’re stopped (ephemeral)
  • Instance store persists during reboots, not during the stop and start of the instance.
  • Good for buffer / cache / scratch data / temporary content

FSX

  • Allows us to launch 3rd party high-performance file systems on AWS
  • Useful when we don’t want to use an AWS managed file system like S3
  • Can be accessed from your on-premise infrastructure

FSx for Windows

  • A managed Windows Server that runs Windows Server Message Block (SMB) -based file services.
  • Designed for Windows and Windows applications.
  • Supports Multi-AZ (high availability)
  • Supports AD users, access control lists, groups, and security policies, along with Distributed File System (DFS) namespaces and replication.

Amazon FSx for Lustre

  • A fully managed file system that is optimized for compute-intensive workloads
  • High Performance Computing(HPC)
  • Scales up to 100s GB/s, millions of IOPS, sub-ms latencies
  • Only works with Linux
  • Machine Learning
  • Media Data Processing Workflows

FSx Deployment Options

  • Scratch File System

    • Temporary storage (cheaper)
    • Data is not replicated (data lost if the file server fails)
    • High burst (6x faster than persistent file system)
    • Usage: short-term processing
  • Persistent File System

    • Long-term storage (expensive)
    • Data is replicated within same AZ
    • Failed files are replaced within minutes
    • Usage: long-term processing, sensitive data

How differ EFS, FSx for Windows, or FSx for Lustre

  • EFS: When you need distributed, highly resilient storage for Linux instances and Linux-based applications.
  • Amazon FSx for Windows: When you need centralized storage for Windows-based applications, such as SharePoint\ Microsoft SQL Server, Workspaces, IIS Web Server, or any other native Microsoft application.
  • Amazon FSx for Lustre When you need high-speed, high-capacity distributed storage.
    This will be for applications that do high performance computing (HPC), financial modeling, etc.
    Remember that FSx for Lustre can store data directly on S3

Storage Gateway

  • Bridge between on-premises data and cloud data
  • Not suitable for one-time sync of large amounts of data (use DataSync instead)
  • Optimizes data transfer by sending only changed data
  • Use cases:
    • disaster recovery
    • backup & restore
    • on-premises cache & low-latency files access

Types of Storage Gateway

S3 File Gateway

  • Configured S3 buckets are accessible using the NFS and SMB protocol
  • Most recently used data is cached in the file gateway__
  • Supports S3 Standard, S3 Standard IA, S3 One Zone A, S3 Intelligent Tiering
  • Transition to S3 Glacier using a Lifecycle Policy
  • Bucket access using IAM roles for each File Gateway
  • SMB Protocol has integration with Active Directory (AD) for user authentication

image

FSx File Gateway

  • Native access to Amazon FSx for Windows File Server
  • Local cache for frequently accessed data
  • Windows native compatibility (SMB, NTFS, Active Directory...)
  • Useful for group file shares and home directories

image

Volume Gateway

  • Block storage using iSCSI(Internet Small Computer System Interface ) protocol backed by S3

  • Backed by EBS snapshots which can help restore on-premises volumes

  • Two kinds of volumes:

    • Cached volumes:
      • You store your data in S3 and retain a copy of frequently accessed data subsets locally
      • Low latency access to most recent data
    • Stored volumes:
      • you store the entire set of volume data on premise and store periodic point in time backup(snapshots) in S3
      • Low-latency access to your entire dataset

    image

Tape Gateway

  • Used to backup on-premises data using tape-based process to S3 as Virtual Tapes
  • Uses iSCSI protocol

image

Storage Gateway - Hardware Appliance

  • Storage Gateway requires on-premises virtualization. If you don’t have virtualization available, you can use a Storage Gateway - Hardware Appliance. It is a mini server that you need to install on-premises.
  • Does not work with FSx File Gatway

Aws Backup

Backup allows you to consolidate your backups across multiple AWS services, such as :

  • EC2
  • EBS
  • EFS
  • S3
  • Amazon FSx for Lustre
  • Amazon FSx for Windows File Server
  • AWS Storage Gateway
  • RDS
  • DynamoDB

It gives you centralized control across all AWS services, in multiple AWS accounts across the entire AWS organization.

Aws backup benefit

  • Central Management
  • Automation : create automated backup schedules and retention policies,create lifecycle policies
    allowing you to expire unnecessary backups after a period of time
  • Improved Compliance: Backup policies can be enforced while backups can be encrypted both at rest and in transit
    allowing alignment to regulatory compliance

Backup Vault

  • WORM (Write Once Read Many) model for backups
  • Even the root user cannot delete backups
  • Additional layer of defense to protect your backups against:
    • Inadvertent or malicious delete operations
    • Updates that shorten or alter retention periods

Database

RDS

  • RDS stands for Relational Database Service
  • It’s a managed DB service for DB use SQL as a query language
  • RDS is generally used for online transaction processing (OLTP) workloads
  • Databases supported :
    • Postgres
    • MySql
    • MariaDB
    • Oracle
    • Microsoft SQL Server
    • Aurora
  • Continuous backups and restore to specific timestamp (Point in Time Restore)!
  • You can’t SSH into your instances

RDS Auto Scaling

  • When RDS detects you are running out of free database storage, it scales automatically.
  • You have to set Maximum Storage Threshold (maximum limit for DB storage)
  • Condition for automatic storage scaling:
    • Free storage is less than 10% of allocated storage
    • Low-storage lasts at least 5 minutes
    • 6 hours have passed since last modification

RDS Read Replicas for read scalability AKA Performance

  • A read-only copy of your primary database in the same AZ, cross-AZ, or cross-region
  • Used to increase or scale read performance.
  • Up to 5 Read Replicas
  • Replication is ASYNC so reads are eventually consistent
  • Replicas can be promoted to their own DB
  • Applications must update the connection string to leverage read replicas

RDS Multi AZ for Disaster Recovery

  • With Multi-AZ, RDS creates an exact copy of your production database in another Availability Zone
  • Synchronous replication
  • One DNS name, so connection string does not require to be updated(both the databases can be accessed by one DNS name
    which allows for automatic DNS failover to standby database)
  • When failing over, RDS flips the CNAME(map hostname to another hostname, so map dns name to to standby dns name) record for the DB instance to point at the standby, which is in turn promoted to become the new primary.
  • Cannot be used for scaling as the standby database cannot take read/write operation
  • The Read Replicas can be setup as Multi AZ for Disaster Recovery(DR)

image

RDS From Single-AZ to Multi-AZ

  • Zero downtime operation (no need to stop the DB)
  • Just click on “modify” for the database
  • The following happens internally
    • A snapshot is taken
    • A new DB is restored from the snapshot in a new AZ
    • Synchronization is established between the two databases

image

RDS Backup

  • Automated Backups (enabled by default)
    • Daily full backup of the database (during the defined maintenance window)
    • Backup retention: 7 days (max 35 days)
    • Transaction logs are backed-up by RDS every 5 minutes (point in time recovery)
  • DB Snapshots
    • Manually triggered
    • Backup retention: unlimited
    • in a stopped RDS database, you will still pay for storage. If you plan on stopping it for a long time, you should snapshot & restore instead

RDS Proxy

  • Fully managed database proxy for RDS
  • Allows apps to pool and share DB connections established with the database
  • improving database efficiency by reducing the stress on database resources (e.g., CPU, RAM) and minimize open connections (and timeouts)
  • Serverless, autoscaling, highly available (multi-AZ)
  • Reduced RDS & Aurora failover time by up 66%
  • Enforce IAM Authentication for DB, and securely store credentials in AWS Secrets Manager
  • RDS Proxy is never publicly accessible (must be accessed from VPC)

RDS Custom

  • Managed Oracle and Microsoft SQL Server Database with OS and database customization
  • RDS: Automates setup, operation, and scaling of database in AWS
  • Custom: access to the underlying database and OS so you can
    • Configure settings
    • Install patches
    • Enable native features
    • Access the underlying EC2 Instance using SSH or SSM Session Manager
  • De-activate Automation Mode to perform your customization
  • RDS vs. RDS Custom
    • RDS: entire database and the OS to be managed by AWS
    • RDS Custom: full admin access to the underlying OS and the database

Amazon Aurora

  • Aurora is a proprietary technology from AWS (not open sourced)
  • Postgres and MySQL are both supported as Aurora DB
  • Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
  • Aurora storage automatically grows in increments of 10GB, up to 128 TB.
  • Up to 15 read replicas
  • Supports only MySQL & PostgreSQL
  • Failover in Aurora is instantaneous.
  • Backtrack: restore data at any point of time without using backups

Aurora High Availability

  • 2 copies of your data are contained in each Availability Zone
  • with a minimum of 3 Availability Zones
  • 6 copies of your data
  • Aurora is designed to transparently handle the loss of up to 2 copies of data without affecting
  • database write availability and up to 3 copies without affecting read availability
  • Aurora storage is also self-healing.
  • Data blocks and disks are continuously scanned for errors and repaired automatically
  • Support for Cross Region Replication
  • Automated failover A read replica is promoted as the new master in less than 30 seconds
  • In case no replica is available, Aurora will attempt to create a new DB Instance in the same AZ as the original instance

Aurora Read Scaling

  • One Aurora Instance takes writes (master)
  • Master + up to 15 Aurora Read Replicas serve reads
  • Aurora DB Cluster :
    • Writer Endpoint :
      • Always points to the master (can be used for read/write)
      • Each Aurora DB cluster has one writer cluster endpoint
    • Reader Endpoint
      • Provides load-balancing for read replicas only (used to read only)
      • If the cluster has no read replica, it points to master (can be used to read/write)
      • Each Aurora DB cluster has one reader endpoint
      • When the client want to read the readerEndpoint will load balacing to a Read Replica
    • Custom Endpoint
      • Used to point to a subset of replicas
      • Provides load-balanced based on criteria other than the read-only or read-write capability of the DB instances like instance class (ex, direct internal users to low-capacity instances and direct production traffic to high-capacity instances)
      • When a custom endpoint is set a Reader Endpoint will not be used

Aurora Serverless

  • Optional
  • Automated database instantiation and auto scaling based on actual usage
  • Good for infrequent,intermittent or unpredictable workloads
  • No capacity planning needed
  • Pay per second, can be more cost effective

Aurora Multi-Master

  • Optional
  • In case you want immediate failover for write node (High availability)
  • Every node does R/W - vs promoting a Read Replica as the new master

Aurora Global Database

  • Aurora Cross Region Read Replicas:
    • Useful for disaster recovery
  • Designed for globally distributed applications with low latency local reads in each region
  • 1 Primary Region (read / write)
  • Up to 5 secondary (read-only) regions (replication lag < 1 second)
  • Up to 16 Read Replicas per secondary region
  • Helps for decreasing latency for clients in other geographical locations
  • RTO of less than 1 minute (to promote another region as primary)

Aurora Backup

  • Automated backups
    • 1 to 35 days (cannot be disabled)
    • point-in-time recovery in that timeframe
  • Manual DB Snapshots
    • Manually triggered by the user
    • Retention of backup for as long as you want
    • Aurora clone is faster to create a new DB than form Snapshot

ElastiCache

  • The same way RDS is to get managed Relational Databases…
  • ElastiCache is to get managed Redis or Memcached
  • Caches are in-memory databases with really high performance, low latency
  • Helps make your application stateless, because it doesn’t have to cache locally

DynamoDB

  • Fully managed, highly available with replication across multiple AZs
  • NoSQL database - not a relational database - with transaction support
  • Scales to massive workloads, distributed database
  • Single digit millisecond response time at any scale
  • Maximum size of an item is 400KB
  • Supports TTL (automatically delete an item after an expiry timestamp)
  • Supports Transactions (either write to multiple tables or write to none)- DynamoDB transactions.
  • DynamoDB transactions provide developers atomicity, consistency, isolation, and durability
    (ACID across 1 or more tables within a single AWS account and region.
  • All-or-nothing transactions.

Capacity

  • Provisioned Mode (default)
    • You specify the number of reads/writes per second
    • You need to plan capacity beforehand
    • Pay for provisioned Read Capacity Units (RCU) & Write Capacity Units (WCU)
    • Auto-scaling option (eg. set RCU and WCU to 80% and the capacities will be scaled automatically based on the workload)
  • On-demand Mode
    • Read/writes automatically scale up/down based on workloads
    • No capacity planning needed
    • Pay for what you use, more expensive ($$$)
    • Great for unpredictable workloads, steep sudden spikes

DynamoDB Accelerator (DAX)

  • Fully managed, highly available, in-memory cache
  • 10x performance improvement
  • Reduces request time from milliseconds to microseconds even under load
  • Help solve read congestion by caching
  • 5 minutes TTL for cache (default)
  • Doesn’t require application code changes

DynamoDB Streams

  • Ordered stream of notifications of item-level modifications (create/update/delete) in a table
  • Destination can be:
    • Kinesis Data Streams
    • AWS Lambda
    • Kinesis Client Library applications
  • Data Retention for up to 24 hours
  • Allow Implement cross-region replication
  • React to changes in real-time (welcome email to users)

DynamoDB Global Tables

  • Globally distributed applications
  • Based on DynamoDB streams
  • Multi-region redundancy for disaster recovery or high availability
  • Replication latency under 1 second
  • Must enable DynamoDB Streams as a pre-requisite

DocumentDB

  • Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
  • DocumentDB is the same for MongoDB (which is a NoSQL database)
  • Fully Managed, highly available with replication across 3 AZ
  • DocumentDB storage automatically grows in increments of 10GB, up to 64 TB.
  • Automatically scales to workloads with millions of requests per seconds

Amazon Neptune

  • Fully managed graph database
  • A popular graph dataset would be a social network
  • Highly available across 3 AZ, with up to 15 read replicas
  • Highly available with replications across multiple AZs

Amazon QLDB

  • QLDB stands for ”Quantum Ledger Database”
  • A ledger is a book recording financial transactions
  • Fully Managed, Serverless, High available, Replication across 3 AZ
  • Used to review history of all the changes made to your application data over time
  • Immutable system: no entry can be removed or modified, cryptographically verifiable
  • You cannot update a record (i.e.,replace old content) in a ledger database. Instead, an update adds a new record to the databas
  • Use case : financial transactions, supply chain, cryptocurrencies, such as Bitcoin, blockchain

Amazon Timestream

  • Fully managed, fast, scalable, serverless time series database
  • Automatically scales up/down to adjust capacity
  • Encryption in transit and at rest
  • Use cases: IoT apps, operational applications, real time analytics, …

Decoupling applications

  • Synchronous between applications can be problematic if there are sudden spikes of traffic
  • What if you need to suddenly encode 1000 videos but usually it’s 10?
  • In that case, it’s better to decouple your applications:
    • using SQS: queue model
    • using SNS: pub/sub model
    • using Kinesis: real-time streaming model
  • These services can scale independently from our application

SQS

For Simple Notification Service

  • Used to asynchronously decouple applications
  • Supports multiple producers & consumers
  • The message is persisted in SQS until a consumer deletes it
  • The consumer polls the queue for messages. Once a consumer processes a message, it deletes it from the queue using DeleteMessage API.
  • Max message size: 256KB
  • Default message retention: 4 days (max: 14 days)
  • Consumers could be EC2 instances or Lambda functions, Kinesis

Queue Types

Standard Queue

  • Unlimited throughput (publish any number of message per second into the queue)
  • Low latency (<10 ms on publish and receive)
  • Can have duplicate messages (at least once delivery)
  • Can have out of order messages (best effort ordering)

FIFO Queue

  • Limited throughput: 300 msg/s without batching, 3000 msg/s with
  • Messages are processed in order by the consumer
  • Message De-duplication:
    • De-duplication interval: 5 min (duplicate messages will be discarded only if they are sent less than 5 mins apart)
    • De-duplication methods:
      • Content-based de-duplication: computes the hash of the message body and compares
      • Using a message de-duplication ID: messages with the same de-duplication ID are considered duplicates
  • Message Grouping
    • Group messages based on MessageGroupID to send them to different consumers
    • Same value for MessageGroupID
      • All the messages are in order
      • Single consumer
    • Different values for MessageGroupID
      • Messages will be ordered for each group ID
      • Ordering across groups is not guaranteed(messages that belong to different message groups)
      • Each group ID can have a different consumer (parallel processing)

Consumer Auto Scaling

We can attach an ASG to the consumer instances which will scale based on the CW metric Queue Length(ApproximateNumberOfMessages) CW alarms can be triggered to step scale the consumer application.

Security

Encryption

  • In-flight encryption using HTTPS API
  • At-rest encryption using KMS keys
  • Client-side encryption if the client wants to perform encryption/decryption itself

Access Controls : IAM policies to regulate access to the SQS API

SQS Access Policies(resource based policy)

  • Useful for cross-account access to SQS queues
  • Useful for allowing other services (SNS, S3…) to write to an SQS queue

Configurations

Message Visibility Timeout

  • After a message is polled by a consumer, it becomes invisible to other consumers
  • By default, the “message visibility timeout” is 30 seconds
  • That means the message has 30 seconds to be processed
  • After the message visibility timeout is over, the message is “visible” in SQS
  • A consumer could call the ChangeMessageVisibility API to get more time
  • If visibility timeout is high (hours), and consumer crashes, re-processing will take time
  • If visibility timeout is too low (seconds), we may get duplicates

image

Dead Letter Queue (DLQ)

  • An SQS queue used to store failing to be processed messages in another queue
  • After the MaximumReceives(the number of times that a message can be received before being sent to a dead-letter queue) threshold is exceeded, the message goes into the DLQ
  • Redrive to Source - once the bug in the consumer has been resolved, messages in the DLQ can be sent back to the queue (original queue or a custom queue) for processing
  • Prevents resource wastage
  • Recommended to set a high retention period for DLQ (14 days)

Queue Delay/Delivery Delay

  • Delay message delivery
  • Consumers see the message after some delay
  • Default: 0 (Max: 15 min)
  • Can be set at the queue level

Long Polling

  • When a consumer requests messages from the queue, it can optionally “wait” for messages to arrive if there are none in the queue
  • This is called Long Polling
  • Decreases the number of API calls made to SQS (cheaper)
  • Reduces latency (incoming messages during the polling will be read instantaneously)
  • Polling time: 1 sec to 20 sec
  • Long Polling is preferred over Short Polling
  • Can be enabled at the queue level or at the consumer level by using WaitTimeSeconds parameter in__ ReceiveMessage__ API.

SQS + Lambda + DLQ

Failed messages (after the set number of retries) are sent to the DLQ by the SQS queue image

SNS

For SNS for Simple Queue Service

  • Pub-Sub model (publisher publishes messages to a topic, subscribers listen to the topic)
  • Instant message delivery (does not queue messages)

Security

Encryption

  • In-flight encryption using HTTPS API
  • At-rest encryption using KMS keys
  • Client-side encryption if the client wants to perform encryption/decryption itself

Access Controls : IAM policies to regulate access to the SNS API

SNS Access Policies(resource based policy)

  • Useful for cross-account access to SNS queues
  • Useful for allowing other services (S3…) to write to an SNS queue

Standard Topics

  • Highest throughput
  • At least once message delivery
  • Best effort ordering
  • Subscribers can be:
    • SQS queue
    • HTTP / HTTPS endpoints
    • Lambda functions
    • Emails (using SNS)
    • SMS & Mobile Notifications
    • Kinesis Data Firehose (KDF) to send the data into S3 or Redshift

FIFO Topics

  • Guaranteed ordering of messages in that topic
  • Publishing messages to a FIFO topic requires:
    • Ordering by Message Group ID (all messages in the same group are ordered)
    • Deduplication using a Deduplication ID or Content Based Deduplication
  • Can only have SQS FIFO queues as subscribers
  • Limited throughput (same as SQS FIFO) because only SQS FIFO queues can read from FIFO topics

SNS + SQS Fanout Pattern

  • Fully decoupled, no data loss
  • SQS allows for: data persistence, delayed processing and retries of work
  • Make sure your SQS queue access policy allows for SNS to write

image

Kinesis

  • Makes it easy to collect, process, and analyze streaming data in real-time
  • Ingest real-time data such as: Application logs, Metrics, Website clickstreams, IoT telemetry data…

Kinesis Data Streams

  • Real-time data streaming service

  • Used to ingest data in real time directly from source

  • Retention between 1 day to 365 days

  • Ability to reprocess (replay) data

  • Once data is inserted in Kinesis, it can’t be deleted (immutability)

  • Data that shares the same partition goes to the same shard (ordering)

  • Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent

  • Consumers:

    • Write your own: Kinesis Client Library (KCL), AWS SDK
    • Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics,
  • Capacity Modes

    • Provisioned
    • You choose the number of shards provisioned, scale manually or using API
    • Each shard gets 1MB/s in (or 1000 records per second)
    • Each shard gets 2MB/s out (classic or enhanced fan-out consumer)
    • You pay per shard provisioned per hour
    • On-demand mode
    • No need to provision or manage the capacity
    • Default capacity provisioned (4 MB/s in or 4000 records per second)
    • Scales automatically based on observed throughput peak during the last 30 days
    • Pay per stream per hour & data in/out per GB

    image

Kinesis Data Firehose

  • Fully Managed Service, no administration, automatic scaling, serverless
  • Used to load streaming data into a target location with optional transformation
  • Can ingest data in real time directly from source
  • Destinations:
    • AWS: Redshift, S3, OpenSearch
    • 3rd party: Splunk, MongoDB, DataDog, NewRelic, etc.
    • Custom HTTP endpoint
    • Supports custom data transformation using Lambda functions
    • No replay capability (does not store data like KDS)

image

Amazon MQ

  • If you have some traditional applications running from on-premise, they may use open protocols such as MQTT, AMQP, STOMP, Openwire, WSS, etc. When migrating to the cloud, instead of re-engineering the application to use SQS and SNS (AWS proprietary), we can use Amazon MQ (managed Apache ActiveMQ) for communication.
  • Doesn’t “scale” as much as SQS or SNS because it is provisioned
  • Runs on a dedicated machine (can run in HA with failover)
  • Has both queue feature (SQS) and topic features (SNS)

Event Bridge

  • Schedule or Cron to create events on a schedule
  • Event Pattern: Event rules to react to a service doing something
  • Target: Trigger Lambda functions, send SQS/SNS messages etc

Data & Analytics

Athena

  • Serverless query service to analyze data stored in Amazon S3
  • Uses SQL language to query the files
  • Built on Presto engine
  • Output stored in S3
  • Supports CSV, JSON, ORC, Avro, and Parquet file format
  • Commonly used with Amazon Quicksight for reporting/dashboards

Performance

  • Use columnar data for cost-savings (less scan)
  • Compress data for smaller retrievals (bzip2, gzip, lz4, snappy, zlip, zstd…)
  • Partition datasets in S3 for easy querying on virtual columns

Amazon Athena – Federated Query

  • Allows you to run SQL queries across data stored in relational, non-relational, object, and custom data sources (AWS or on-premises)
  • Store the results back in Amazon S3

Redshift

  • AWS managed data warehouse (10x better performance than other data warehouses)

  • Based on PostgreSQL

  • Used for Online Analytical Processing (OLAP) and high performance querying

  • Columnar storage of data with massively parallel query execution in SQL

  • Faster querying than Athena due to indexes

  • Need to provision instances as a part of the Redshift cluster (pay for the instances provisioned)

  • Integrated with Business Intelligence (BI) tools such as QuickSight or Tableau

  • Redshift Cluster can have 1 to 128 nodes (128TB per node)

    • Leader Node: query planning & result aggregation
    • Compute Nodes: execute queries & send the result to leader node
  • No multi-AZ support (all the nodes will be in the same AZ)

Loading data into Redshift

  • S3
    • Use COPY command to load data from an S3 bucket into Redshift
    • Without Enhanced VPC Routing
      • data goes through the public internet
    • Enhanced VPC Routing
      • data goes through the VPC without traversing the public internet
  • Kinesis Data Firehose
    • Sends data to S3 and issues a COPY command to load it into Redshift
  • EC2 Instance
    • Using JDBC driver
    • Used when an application needs to write data to Redshift
    • Optimal to write data in batches

Snapshots & DR

  • Snapshots are point-in-time backups of a cluster, stored internally in S3

  • Snapshots are incremental (only what has changed is saved)

  • You can restore a snapshot into a new cluster

  • Automated

    • every 8 hours, every 5 GB, or on a schedule
    • Set retention between 1 to 35 days
  • Manual

    • snapshot is retained until you delete it
  • Feature to automatically copy snapshots into another region

Redshift Spectrum

  • Query data present in S3 without loading it into Redshift
  • Need to have a Redshift cluster to use this feature
  • Query is executed by 1000s of Redshift Spectrum nodes
  • Consumes much less of your cluster's processing capacity than other queries

OpenSearch

  • Amazon OpenSearch is successor to Amazon ElasticSearch
  • Used in combination with a database to perform search operations on the database
  • Can search on any field, even supports partial matches
  • Need to provision a cluster of instances (pay for provisioned instances)
  • Supports Multi-AZ
  • Used in Big Data
  • Security through Cognito & IAM, KMS encryption, TLS
  • Comes with Kibana (visualization) & Logstash (log ingestion)

EMR

  • EMR stands for “Elastic MapReduce”
  • EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
  • The clusters can be made of hundreds of EC2 instances
  • EMR comes bundled with Apache Spark, HBase, Presto, Flink…
  • EMR takes care of all the provisioning and configuration
  • Auto-scaling
  • Integrated with Spot Instances

QuickSight

  • Serverless machine learning-powered business intelligence service to create interactive dashboards
  • Fast, automatically scalable, embeddable
  • Use cases:
    • Business analytics
    • Building visualizations
    • Get business insights using data
  • Integrated with :
    • RDS
    • Aurora
    • Athena
    • Redshift
    • S3

Glue

  • Managed extract, transform, and load (ETL) service
  • Useful to prepare and transform data for analytics
  • Fully serverless service

image

  • Used to get data from a store, process and put it in another store (could be the same store)
  • Glue Job Bookmarks: prevent re-processing old data
  • Glue Data Crawlers crawl databases and collect metadata which is populated in Glue Data Catalog
  • Data lake is stored into S3

Lake Formation

  • Data lake = central place to have all your data for analytics purpose
  • Fully managed service that makes it easy to setup a data lake in days
  • Out-of-the-box source blueprints: S3, RDS, Relational & NoSQL DB…
  • Fine-grained Access Control for your applications (row and column-level)

Kinesis Data analytics

Kinesis Data Analytics (SQL application)

  • Real-time analytics on Kinesis Data Streams & Firehose using SQL
  • Add reference data from Amazon S3 to enrich streaming data
  • Fully managed, no servers to provision
  • Automatic scaling
  • Output
    • Kinesis Data Streams: create streams out of the real-time analytics queries
    • Kinesis Data Firehose: send analytics query results to destinations
  • Use cases:
    • Time-series analytics
    • Real-time dashboards
    • Real-time metrics

image

Kinesis Data Analytics for Apache Flink

  • Use Flink (Java, Scala or SQL) to process and analyze streaming data
  • Use any Apache Flink programming features
  • Flink does not read from Firehose (use Kinesis Analytics for SQL instead)
  • SOurce :
    • Kinesis Data Streams
    • Amazon MSK

MSK Managed Streaming for Apache Kafka

  • Alternative to Amazon Kinesis both allow to stream data
  • Fully managed Apache Kafka on AWS
    • Allow you to create, update, delete clusters
    • MSK creates & manages Kafka brokers nodes & Zookeeper nodes for you
    • Deploy the MSK cluster in your VPC, multi-AZ (up to 3 for HA)
    • Data is stored on EBS volumes for as long as you want

MSK Serverless

  • Run Apache Kafka on MSK without managing the capacity
  • MSK automatically provisions resources and scales compute & storage

Big Data Ingestion Pipeline

  • We want the ingestion pipeline to be fully serverless
  • We want to collect data in real time
  • We want to transform the data
  • We want to query the transformed data using SQL
  • The reports created using the queries should be in S3
  • We want to load that data into a warehouse and create dashboards

Migration & Transfer

Snow Family

  • Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS
  • Offline devices to perform data migrations
  • If it takes more than a week to transfer over the network, use Snowball devices!

Device

  • Snowcone

    • 2 CPUs, 4GB RAM, wired or wireless access
    • 8 TB storage
    • Good for space-constrained environment
    • DataSync Agent is preinstalled
    • When to use: Up to 24 TB, online and offline
  • Snowball Edge

    • Compute Optimized
      • 52 vCPUs, 208 GB of RAM
      • 42 TB storage
      • Supports Storage Clustering
    • Storage Optimized
      • Up to 40 CPUs, 80 GB of RAM
      • 80 TB storage
      • Supports Storage Clustering (up to 15 nodes)
      • Transfer up to petabytes
    • When to use: Up to petabytes(PB)
  • Snowmobile

    • 100 PB storage
    • Used when transferring > 10PB
    • Transfer up to exabytes
    • Does not support Storage Clustering
    • When to use: Up to exabytes(EB)

Edge Computing

  • Process data while it’s being created on an edge location (could be anything that doesn’t have internet or access to cloud)
  • Devices for edge computing:
    • Snowcone
    • Snowball Edge

Data migration

  • Physical data transport solution: move TBs or PBs of data in or out of AWS
  • Pay per data transfer job
  • Provide block storage and Amazon S3 compatible object storage
  • Snowball cannot import to Glacier directly (transfer to S3, configure a lifecycle policy to transition the data into Glacier)
  • Need to install OpsHub software on your computer to manage Snow Family devices

DataSync

  • DataSync is used primarily for one-time migrations
  • Agent Based : An agent needs to be installed on the on premise data center
  • Move large amount of data to and from
    • On-premises / other cloud to AWS (NFS, SMB, HDFS, S3 API…) – needs agent
    • AWS to AWS (different storage services) – no agent needed
    • Can synchronize to:
      • Amazon S3 (any storage classes – including Glacier)
      • Amazon EFS
      • Amazon FSx (Windows, Lustre, NetApp, OpenZFS...)
    • Replication tasks can be scheduled hourly, daily, weekly
    • File permissions and metadata are preserved (NFS POSIX, SMB…)

image

Transfer Family

  • AWS managed service to transfer files in and out of Simple Storage Service (S3) or EFS using FTP (instead of using proprietary methods)

  • Supported Protocols

    • FTP (File Transfer Protocol) - unencrypted in flight
    • FTPS (File Transfer Protocol over SSL) - encrypted in flight
    • SFTP (Secure File Transfer Protocol) - encrypted in flight
  • Supports Multi AZ

  • Pay per provisioned endpoint per hour + fee per GB data transfers

  • Clients can either connect directly to the FTP endpoint or optionally through Route 53

  • Transfer Family will need permission(IAM roleà)- to read or put data into S3 or EFS

Database Migration Service

  • Migrate entire databases from on-premises to AWS cloud
  • The source database remains available during migration
  • Continuous Data Replication using CDC (change data capture) of the target database
  • Replication type
    • full load, all existing data is moved from sources to targets in parallel.
    • full load plus captures changes to source tables during migration. CDC guarantees transactional integrity
    • CDC only, Only replicate the data changes from the source database.
  • Requires EC2 instance running the DMS software to perform the replication tasks. If the amount of data is large, use a large instance. If multi-AZ is enabled, need an instance in each AZ.

Types of Migration

Homogeneous Migration

  • When the source and target DB engines are the same (eg. Oracle to Oracle)
  • One step process:
    • Use the Database Migration Service (DMS) to migrate data from the source database to the target database

Heterogeneous Migration

  • When the source and target DB engines are different (eg. Microsoft SQL Server to Aurora)
  • Two step proces:
    • Use the Schema Conversion Tool (SCT) to convert the source schema and code to match that of the target database
    • Use the Database Migration Service (DMS) to migrate data from the source database to the target database

Migrating using Snow Family

  • Use the Schema Conversion Tool (SCT) to extract the data locally and move it to the Edge device
  • Ship the Edge device or devices back to AWS
  • After AWS receives your shipment, the Edge device automatically loads its data into an Amazon S3 bucket.
  • AWS DMS takes the files and migrates the data to the target data store (eg. DynamoDB)

Application migration service

AWS Application Discovery Service

  • Plan migration projects by gathering information about on-premises data centers
  • Server utilization data and dependency mapping are important for migrations
  • Two types of migration:
    • Agentless Discovery Via (AWS Agentless Discovery Connector)
      • Agentless Discovery Connector within VMware vCenter
      • VM inventory, configuration, and performance history such as CPU, memory, and disk usage
    • Agent-based Discovery Via (AWS Application Discovery Agent)
      • Install Application Discovery Agent on each VM and each physical server
      • System configuration, system performance, running processes, and details of the network connections between systems
  • Resulting data can be viewed within AWS Migration Hub

AWS Application Migration Service

  • Lift-and-shift (rehost) solution which simplify migrating applications to AWS
  • Converts your physical, virtual, and cloud-based servers to run natively on AWS
  • Supports wide range of platforms, Operating Systems, and databases
  • Minimal downtime, reduced costs

Capture d’écran 2023-03-25 à 12 51 14

RDS and Aurora MySQL Migrations

RDS MySQL to Aurora MySQL

  • Option 1: DB Snapshots from RDS MySQL restored as MySQL Aurora DB
  • Option 2: Create an Aurora Read Replica from your RDS MySQL, and when the replication lag is 0, promote it as its own DB cluster (can take time and cost $)

External MySQL to Aurora MySQL

  • Option 1:
    • Use Percona XtraBackup to create a file backup in Amazon S3
    • Create an Aurora MySQL DB from Amazon S3
  • Option 2:
    • Create an Aurora MySQL DB
    • Use the mysqldump utility to migrate MySQL into Aurora (slower than S3 method)

Use DMS if both databases are up and running

Same process with PostgreSQL

Disaster Recovery

RPO and RTO

  • Any event that has a negative impact on a company’s business continuity or finances is a disaster
  • Recovery Point Objective (RPO): how often you backup your data (determines how much data are you willing to lose in case of a disaster)
  • Recovery Time Objective (RTO): how long it takes to recover from the disaster (down time)

image

Strategies

  • Backup & Restore
    • High RPO (hours)
    • Need to spin up instances and restore volumes from snapshots in case of disaster => High RTO
    • Cheapest & easiest to manage
  • Pilot Light
    • Critical parts of the app are always running in the cloud (eg. continuous replication of data to another region)
    • Low RPO (minutes)
    • Critical systems are already up => Low RTO
    • Ideal when RPO should be in minutes and the solution should be inexpensive
    • DB is critical so it is replicated continuously but EC2 instance is spin up only when a disaster strikes
  • Warm Standby
    • A complete backup system is up and running at the minimum capacity. This system is quickly scaled to production capacity in case of a disaster.
    • Very low RPO & RTO (minutes)
    • Expensive
  • Multi-Site or Hot Site Approach
    • A backup system is running at full production capacity and the request can be routed to either the main or the backup system.
    • Multi-data center approach
    • Lowest RPO & RTO (minutes or seconds)
    • Very Expensive

Machine Learning

Rekognition

  • Find objects, people, text, scenes in images and videos using ML
  • Facial analysis and facial search to do user verification, people counting:
  • Use cases
    • Content Moderation
    • Text Detection
    • Face Detection and Analysis (gender, age range, emotions…)
    • Face Search and Verification
    • Celebrity Recognition
    • Detect content that is inappropriate, unwanted, or offensive (image and videos)
    • Used in social media, broadcast media, advertising, and e-commerce situations to create a safer user experience

Transcribe

  • Automatically convert speech to text
  • Automatically remove Personally Identifiable Information (PII) using Redaction
  • Use cases:
    • transcribe customer service calls
    • automate closed captioning and subtitling

Polly

  • Turn text into lifelike speech using deep learning
  • Allowing you to create applications that talk

Translate

  • Natural and accurate language translation

Lex

  • Amazon Lex
    • same technology that powers Alexa
    • Automatic Speech Recognition (ASR) to convert speech to text
    • Natural Language Understanding to recognize the intent of text, callers
    • Helps build chatbots, call center bots
  • Amazon Connect
    • Receive calls, create contact flows, cloud-based virtual contact center
    • Can integrate with other CRM systems or AWS

Comprehend

  • For Natural Language Processing – NLP
  • Fully managed and serverless service
  • Uses machine learning to find insights and relationships in text

Comprehend Medical

  • Amazon Comprehend Medical detects and returns useful information in unstructured clinical text
  • Uses NLP to detect Protected Health Information (PHI)

SageMaker

  • Fully managed service for developers / data scientists to build ML models
  • Typically, difficult to do all the processes in one place + provision servers
  • Machine learning process (simplified): predicting your exam score

Forecast

  • Fully managed service that uses ML to deliver highly accurate forecasts
  • Example: predict the future sales of a raincoat
  • Use cases: Product Demand Planning, Financial Planning, Resource Planning

Kendra

  • Fully managed document search service powered by Machine Learning
  • Extract answers from within a document (text, pdf, HTML, PowerPoint, MS Word, FAQs…)

Personalize

  • Fully managed ML-service to build apps with real-time personalized recommendations
  • Same technology used by Amazon.com
  • Integrates into existing websites, applications, SMS, email marketing systems, …

Textract

  • Automatically extracts text, handwriting, and data from any scanned documents using AI and ML
  • Read and process any type of document (PDFs, images, …)

Networking

Route 53

  • A highly available, scalable, fully managed and Authoritative DNS(cusutmer can update DNS records)
  • Route 53 is also a Domain Registrar
  • Ability to check the health of your resources
  • The only AWS service which provides 100% availability SLA

Hosted Zone

  • A container for records that define how to route traffic to a domain and its subdomains

  • Hosted zone is queried to get the IP address from the hostname

    Two types

    • Public Hosted Zone
      • resolves public domain names
      • can be queried by anyone on the internet
    • Private Hosted Zone
      • resolves private domain names
      • can only be queried from within the VPC

Record Types

Each record contains:

  • Domain/subdomain Name – e.g., example.com

  • Record Type – e.g., A or AAAA

  • Value – e.g., 12.34.56.78

  • Routing Policy – how Route 53 responds to queries

  • TTL – amount of time the record cached at DNS Resolvers

  • A – maps a hostname to IPv4

  • AAAA – maps a hostname to IPv6

  • CNAME – maps a hostname to another hostname

    • The target is a domain name which must have an A or AAAA record
    • Cannot point to root domains (Zone Apex) Ex: you can’t create a CNAME record for example.com, but you can create for something.example.com
  • NS (Name Servers) - controls how traffic is routed for a domain

  • Alias - maps a hostname to an AWS resource(app.mydomain.com => blabla.amazonaws.com)

    • Native health check
    • AWS proprietary
    • Can point to root (zone apex) and non-root domains
    • Alias Record is of type A or AAAA (IPv4 / IPv6)
    • Automatically recognizes changes in the resource’s IP addresses
    • You can’t set the TTL
    • Targets can be:
      • Elastic Load Balancers
      • CloudFront Distributions
      • API Gateway
      • Elastic Beanstalk environments
      • S3 Websites
      • VPC Interface Endpoints
      • Global Accelerator accelerator
    • Target cannot be an EC2 DNS name

    Routing Policies

    Define how Route 53 responds to DNS queries

    Simple

    • Route to one or more resources
    • If multiple values are returned, client chooses one at random
    • No health check (if returning multiple resources, some of them might be unhealthy)
    • When Alias enabled, you can only specify one Aws resource as a target

Weighted

  • Control the % of the requests that go to each specific resource
  • Can be associated with Health Checks
  • Use cases: load balancing between regions, testing, new application versions…

Failover(Active-Passive)

  • Primary & Secondary Records (if the primary application is down, route to secondary application)
  • Health check must be associated with the primary record, you can also associate health check to secondary
  • Used for Active-Passive failover strategy

Latency-based

  • Redirect to the resource that has the lowest network latency
  • Latency is based on traffic between users and AWS Regions
  • Can be associated with Health Checks (has a failover capability)

Geolocation

  • Routing based on the client's location
  • Specify location by Continent, Country
  • Should create a “Default” record (in case there’s no match on location)
  • Use cases: restrict content distribution & language preference
  • Can be associated with Health Checks

Geoproximity

  • Route traffic to your resources based on the geographic location of users and resources
  • Ability to shift more or less traffic to resources based on the defined bias
  • To change the size of the geographic region, specify bias values:
    • To expand (1 to 99) – more traffic to the resource
    • To shrink (-1 to -99) – less traffic to the resource
  • To use geoproximity routing you must use Route 53 Traffic Flow

Multi-value

  • Route traffic to multiple resources (max 8)
  • Health Checks (only healthy resources will be returned)
  • __Multi-value is not subsitute for having an ELB, it the client side load balancing
  • At difference of simple routing all response returned are healthy

Health Checks

  • HTTP Health Checks are only for public resources
  • Automated for Automated DNS Failover
  • Three types:
    • Monitor an endpoint (application or other AWS resource)
      • Multiple global health checkers check the endpoint health
      • Must configure the application firewall to allow incoming requests from the IPs of Route 53 Health Checkers
      • Supported protocols: HTTP, HTTPS and TCP
    • Monitor other health checks (Calculated Health Checks)
      • Combine the results of multiple Health Checks into one (AND, OR, NOT)
      • Specify how many of the health checks need to pass to make the parent pass
      • Usage: perform maintenance to your website without causing all health checks to fail
    • Monitor CloudWatch Alarms (to perform health check on private resources(Private Hosted Zone ))
      • Route 53 health checkers are outside the VPC. They can’t access private endpoints (private VPC or on-premises resources).
      • Create a CloudWatch Metric and associate a CloudWatch Alarm to it, then create a Health Check that checks the Cloud watch alarm.

API Gateway

  • Serverless REST APIs
  • Invoke Lambda functions using REST APIs (API gateway will proxy the request to lambda)
  • Supports WebSocket (stateful)
  • Cache API responses
  • Can be integrated with any HTTP endpoint in the backend or any AWS API

API Gateway – Integrations

Lambda Function

  • Invoke Lambda function
  • Easy way to expose REST API backed by AWS Lambda HTTP
  • Expose HTTP endpoints in the backend
  • Example: internal HTTP API on premise, Application Load Balancer
  • Why? Add rate limiting, caching, user authentications, API keys, etc AWS Service
  • Expose any AWS API through the API Gateway
  • Example: start an AWS Step Function workflow, post a message to SQS
  • Why? Add authentication, deploy publicly, rate control

Endpoint Types

  • Edge-Optimized (default):
    • For global clients
    • Requests are routed through the CloudFront edge locations (improves latency)
    • The API Gateway lives in only one region but it is accessible efficiently through edge locations
  • Regional
    • For clients within the same region
    • Could manually combine with your own CloudFront distribution for global deployment (this way you will have more control over the caching strategies and the distribution)
  • Private
    • Can only be accessed within your VPC using an Interface VPC endpoint (ENI)
    • Use resource policy to define access

Security

  • User Authentication through
    • IAM Roles (useful for internal applications)
    • Cognito (identity for external users – example mobile users)
    • Custom Authorizer( Using lambda function to validate the token being passed in the header and return an lAM policy to determine if the user should be allowed to access the resource )
  • Custom Domain Name HTTPS security through integration with AWS Certificate Manager (ACM)

VPC

  • VPC = Virtual Private Cloud
  • You can have multiple VPCs in an AWS region (max. 5 per region – soft limit)
  • Because VPC is private, only the Private IPv4 ranges are allowed:
    • 10.0.0.0 – 10.255.255.255 (10.0.0.0/8)
    • 172.16.0.0 – 172.31.255.255 (172.16.0.0/12)
    • 192.168.0.0 – 192.168.255.255 (192.168.0.0/16)
  • Max. CIDR per VPC is 5, for each CIDR
    • Min. size is /28 (16 IP addresses)
    • Max. size is /16 (65536 IP addresses)

VPC – Subnet

  • AWS reserves 5 IP addresses (first 4 & last 1) in each subnet

Internet Gateway (IGW)

  • Allows resources (e.g., EC2 instances) in a VPC connect to the Internet
  • It scales horizontally and is highly available and redundant
  • One VPC can only be attached to one IGW and vice versa
  • Internet Gateways on their own do not allow Internet access
  • Route tables must also be edited!

Bastion Hosts

  • A EC2 instance running in the public subnet (accessible from public internet), to allow users to SSH into the instances in the private subnet.
  • Bastion Host security group must allow inbound from the internet on port 22 from restricted CIDR, for example the public CIDR of your corporation
  • Security Group of the EC2 Instance must allow the Security Group of the Bastion Host, or the private IP of the Bastion host

NAT Instance

  • NAT for Network Address Translation
  • Allows EC2 instances in private subnets to connect to the Internet
  • Must be launched in a public subnet
  • Must disable EC2 setting: Source /destination Check because NAT instance forward traffic that does not belong to him
  • Must have Elastic IP attached to it
  • Route Tables must be configured to route traffic from private subnets to the NAT Instance
  • Can be used as a Bastion Host
  • Disadvantages:
    • Not highly available or resilient out of the box. Need to create an ASG in multi-AZ + resilient user-data script
    • Internet traffic bandwidth depends on EC2 instance type

Capture d’écran 2023-03-12 à 21 21 53

NAT Gateway

  • AWS-managed NAT, higher bandwidth, high availability, no administration
  • Pay per hour for usage and bandwidth
  • Preferred over NAT instances
  • NATGW is created in a specific Availability Zone, uses an Elastic IP
  • Can’t be used by EC2 instance in the same subnet (only from other subnets)
  • Can't be shared accross VPCs(available only in one VPC)
  • Requires an IGW (Private Subnet => NATGW => IGW)
  • Created in a public subnet
  • 5 Gbps of bandwidth with automatic scaling up to 45 Gbps
  • No Security Groups to manage / required
  • Route Tables for private subnets must be configured to route internet-destined traffic to the NAT gateway

image

Architecture

image

NAT Gateway with High Availability

  • NAT Gateway is resilient within a single Availability Zone
  • Must create multiple NAT Gateways in multiple AZs for fault-tolerance
  • No cross-AZ failover needed because if an AZ goes down, all of the instances in that AZ also go down.

Network Access Control List (NACL)

  • NACL are like a firewall which control traffic from and to subnets

  • One NACL per subnet but a NACL can be attached to multiple subnets

  • New subnets are assigned the Default NACL

  • Default NACL allows all inbound & outbound requests

  • New NACL rule By default deny all inbound and outbound traffic until you add rules

  • * All Traffic Deny This rule ensures that if a packet doesn't match any of the other numbered rules, it's denied. You can't modify or remove this rule.

  • NACL Rules:

    • Based only on IP addresses
    • Rules number: 1-32766 (lower number has higher precedence)
    • First rule match will drive the decision
    • The last rule denies the request (only when no previous rule matches)
    • Each subnet in your VPC must be associated with a network ACL. If you don't explicitly associate a subnet with a network ACL, the subnet is automatically associated with the default network ACL.

NACL vs Security Group

  • NACL
    • Firewall for subnets
    • Supports both Allow and Deny rules
    • Stateless (both request and response will be evaluated against the NACL rules)
  • Security Group
    • Firewall for EC2 instances
    • Supports only Allow rules
    • Stateful return traffic is automatically allowed,regardless of any rules

image

NACL with Ephemeral Ports

  • For any two endpoints to establish a connection, they must use ports
  • Clients connect to a defined port, and expect a response on an ephemeral port
  • In the example below, the client EC2 instance needs to connect to DB instance. Since the ephemeral port can be randomly assigned from a range of ports, the Web Subnets’s NACL must allow inbound traffic from that range of ports and similarly DB Subnet’s NACL must allow outbound traffic on the same range of ports.

image

VPC Peering

  • Privately connect two VPCs using AWS network
  • Must not have overlapping CIDRs
  • VPC Peering connection is NOT transitive
  • Must update route tables in each VPC’s subnets to ensure requests destined to the peered VPC can be routed through the peering connection
  • You can create VPC Peering connection between VPCs in different AWS accounts/regions(cross account or cross region)
  • You can reference a security group in a peered VPC (works cross accounts – same region)

image

image

image

VPC Endpoints

  • Every AWS service is publicly exposed (public URL)
  • VPC Endpoints (powered by AWS PrivateLink) allows you to connect to AWS services using a private network instead of using the public Internet
  • They’re redundant and scale horizontally
  • They remove the need of IGW, NATGW, … to access AWS Services
  • Types of Endpoints :
    • Interface Endpoints (powered by PrivateLink)
      • Provisions an ENI (private IP address) as an entry point
      • Need to attach a security group to the interface endpoint to control access
      • Supports most AWS services
      • No need to update the route table
      • $ per hour + $ per GB of data processed
    • Gateway Endpoint
      • Provisions a gateway
      • Must be used as a target in a route table
      • Supports only S3 and DynamoDB
      • Free

image

VPC Flow Logs

  • Captures information about IP traffic going into your interfaces
  • Three levels:
    • VPC Flow Logs
    • Subnet Flow Logs
    • ENI Flow Logs
  • Can be configured to show accepted, rejected or all traffic
  • Flow logs data can be sent to S3 (bulk analytics) or CloudWatch Logs (near real-time via metric filter)
  • Query VPC flow logs using Athena in S3 or CloudWatch Logs Insights

IPv6 Support

  • IPv4 cannot be disabled for your VPC
  • Enable IPv6 to operate in dual-stack mode in which your EC2 instances will get at least a private IPv4 and a public IPv6. They can communicate using either IPv4 or IPv6 to the internet through an Internet Gateway.
  • If you cannot launch an EC2 instance in your subnet, It’s not because it cannot acquire an IPv6 (the space is very large). It’s because there are no available IPv4 in your subnet.
  • Solution: Create a new IPv4 CIDR in your subnet

Egress-only Internet Gateway

  • Allows instances in your VPC to initiate outbound connections over IPv6 while preventing inbound IPv6 connections to your private instances.
  • Similar to NAT Gateway but for IPv6
  • Must update Route Tables

image

PrivateLink

To open our applications up to other VPCs, we can either:

  • Open the VPC up to the Internet

    • Security considerations; everything in the public subnet is public
    • A lot more to manage
  • Use VPC Peering

    • You will have to create and manage many different peering relationships
  • PrivateLink The best way to expose a service VPC to tens, hundreds, or thousands of customer VPCs

  • Doesn’t require VPC peering; no route tables, NAT gateways, internet gateways, etc

  • Requires a Network Load Balancer on the service VPC and an ENI on the customer VPC

image

Site-to-Site VPN

  • Easiest and most cost-effective way to connect a VPC to an on-premise data center
  • IPSec Encrypted connection through the public internet
  • Virtual Private Gateway (VGW): VPN concentrator on the VPC side of the VPN connection
  • Customer Gateway (CGW): Software application or physical device on customer side of the VPN connection
  • Enable Route Propagation for the Virtual Private Gateway in the route table that is associated with your subnets
  • If you need to ping EC2 instances from on-premises, make sure you add the ICMP protocol on the inbound rules of your security groups

image

AWS VPN CloudHub

  • Low-cost hub-and-spoke model for network connectivity between a VPC and multiple on-premise data centers(VPN only)
  • Every participating network can communicate with one another through the VPN connection
  • It operates over the public internet, but all traffic between the customer gateway and the AWS VPN CloudHub is encrypted.
  • To set it up, connect multiple VPN connections on the same VGW, setup dynamic routing and configure route tables

image

Direct connect

  • Dedicated private connection from an on-premise data center to a VPC
  • Dedicated connection must be setup between your Data Center and AWS Direct Connect locations
  • You need to setup a Virtual Private Gateway on your VPC
  • Data in transit is not-encrypted but the connection is private (secure)
  • More stable and secure than Site-to-Site VPN
  • Access public & private resources on the same connection using Public & Private Virtual Interface (VIF) respectively
  • DIRECT CONNECT IS:
    • Fast
    • Secure
    • Reliable
    • Able to take massive throughput
    • Lower cost

image

Direct Connect Gateway

  • Used to setup a Direct Connect to multiple VPCs from your data center, possibly in different regions but same account
  • Using DX, we will create a Private VIF to the Direct Connect Gateway which will extend the VIF to Virtual Private Gateways in multiple VPCs (possibly across regions).

image

Connection Types

  • Dedicated Connection
    • A physical Ethernet connection associated with a single customer.
    • 1Gbps,10 Gbps and 100 Gbps capacity
  • Hosted Connection
    • A physical Ethernet connection that an AWS Direct Connect Partner provisions on behalf of acustomer
    • 50Mbps, 500 Mbps, to 10 Gbps

Encryption

  • For encryption in flight, use AWS Direct Connect + VPN which provides an IPsec-encrypted private connection
  • Good for an extra level of security

image

Resiliency

  • Best way (redundant direct connect connections)

image

  • VPN connection as a backup
    • In case Direct Connect fails, you can set up a backup Direct Connect connection (expensive), or a Site-to-Site VPN connection

Transit Gateway

  • Transitive peering between thousands of VPCs and on-premise data centers using hub-and-spoke (star) topology
  • Works with Direct Connect Gateway, VPN Connection
  • Regional resource, can work cross-region
  • You can peer Transit Gateways across regions
  • Route Tables to control communication within the transitive network
  • Supports IP Multicast (not supported by any other AWS service)

Increasing BW of Site-to-Site VPN connection

  • ECMP = Equal-cost multi-path routing
  • To increase the bandwidth of the connection between Transit Gateway and corporate data center, create multiple site-to-site VPN connections, each with 2 tunnels (2 x 1.25 = 2.5 Gbps per VPN connection).

image

Share Direct Connect between multiple accounts

  • Share Transit Gateway across accounts using Resource Access Manager (RAM) connection between VPCs in the same region but different accounts

image

Networking Costs

  • Use Private IP instead of Public IP for good savings and better network performance
  • Use same AZ for maximum savings (at the cost of high availability)
  • Traffic entering the AWS is free
  • Traffic leaving an AWS region is paid

Capture d’écran 2023-03-24 à 18 00 11

Minimizing egress traffic network cost

  • Egress traffic: outbound traffic (from AWS to outside)
  • Ingress traffic: inbound traffic - from outside to AWS (typically free)
  • Try to keep as much internet traffic within AWS to minimize costs
  • Direct Connect location that are co-located in the same AWS Region result in lower cost for egress network

Capture d’écran 2023-03-24 à 18 15 16

Content delivery

CloudFront

  • Content Delivery Network (CDN)
  • Improves read performance, content is cached at the edge
  • Improves users experience
  • 216 Point of Presence globally (edgelocations)

Origins

  • S3 bucket

    • For distributing files and caching them at the edge
    • Enhanced security with CloudFront Origin Access Control (OAC)
    • Origin Access Identity (OAl old version) or Origin Access Control (OAC new version) allows the S3 bucket to only be accessed by CloudFront
    • CloudFront can be used as an ingress (to upload files to S3)
  • Custom Origin (HTTP)

    • Application Load Balancer
    • EC2 instance
    • S3 website (must first enable the bucket as a static S3 website)
    • Any HTTP backend you want

CloudFront Geo Restriction

  • You can restrict who can access your distribution
    • Allowlist :
      • Allow your users to access your content only if they're in one of the countries on a list of approved countries.
    • Blocklist
      • Prevent your users from accessing your content if they're in one of the countries on a list of banned countries.
  • The “country” is determined using a 3rd party Geo-IP database
  • Use case: Copyright Laws to control access to content

Signed URL / Cookies

  • Used to make a CloudFront distribution private (distribute to a subset of users)
  • Signed URL ⇒ access to individual files
  • Signed Cookies ⇒ access to multiple files
  • Whenever we create a signed URL / cookie, we attach a policy specifying:
    • URL / Cookie Expiration (TTL)
    • IP ranges allowed to access the data
    • Trusted signers (which AWS accounts can create signed URLs)

Pricing

  • Price Class All: all regions (best performance)
  • Price Class 200: most regions (excludes the most expensive regions)
  • Price Class 100: only the least expensive regions

Global Accelerator

  • Leverage the AWS internal network to route to your application

  • 2 Anycast IP are created for your application

  • The Anycast IP send traffic directly to Edge Locations

  • The Edge locations send the traffic to your application

  • Endpoint could be public or private (could span multiple region):

    • Elastic IP
    • EC2 instances
    • ALB
    • NLB
  • Disaster Recovery

    • Global Accelerator performs health checks for the application
    • Failover in less than 1 minute for unhealthy endpoints
  • Good for:

    • HTTP use cases that require static IP addresses or fast regional failover

Monitoring & Audit

CloudWatch

Serverless performance monitoring service

Metrics

  • CloudWatch provides metrics for every services in AWS
  • Metric is a variable to monitor (CPUUtilization, NetworkIn…)
  • Metrics belong to namespaces
  • Dimension is an attribute of a metric (instance id, environment, etc…)
  • Up to 10 dimensions per metric
  • 2 Type
    • Default metric
      • These metrics are provided out of the box and do not require any additional work on your part to configure
    • Custom
      • These metrics will need to be provided by using the CloudWatch agent installed on the host.

EC2 Monitoring

  • Must run a CloudWatch agent on instance to push system metrics and logs to CloudWatch.
  • Instance role (IAM) must allow the instance to push logs to CloudWatch
  • EC2 instances have metrics every 5 minutes
  • With detailed monitoring (for a cost), you get metrics every 1 minute
  • Use detailed monitoring if you want to react faster to changes (eg. scale faster for your ASG)
  • Available metrics in CloudWatch:
    • CPU Utilization
    • Network Utilization
    • Disk Performance
    • Disk Reads/Writes
  • Custom metrics
    • Memory utilization (memory usage)
    • Disk swap utilization
    • Disk space utilization

CloudWatch Logs Agent

  • Old version of the agent
  • Can only send to CloudWatch Logs

CloudWatch Unified Agent

  • Collect additional system-level metrics such as RAM, processes, etc…
  • Collect logs to send to CloudWatch Logs
  • Centralized configuration using SSM Parameter Store
  • Collected directly on your Linux server / EC2 instance
    • CPU (active, guest, idle, system, user, steal)
    • Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)
    • RAM (free, inactive, used, total, cached)
    • Netstat (number of TCP and UDP connections, net packets, bytes)
    • Processes (total, dead, bloqued, idle, running, sleep)
    • Swap Space (free, used, used %)

Logs

  • Used to store application logs

  • Log Event This is the record of what happened. It contains a timestamp and the data.

  • Log Stream A collection of log events from the same source create a log stream.Think of one continuous set of logs from a single instance

  • Log Group This is a collection of log streams. For example, you’d group all your Apache web server logs across hosts together.

  • Logs can be sent to:

    • S3 buckets (exports)
    • Kinesis Data Streams
    • Kinesis Data Firehose
    • Lambda functions
    • ElasticSearch

Metric Filters can be used to filter expressions and use the count to trigger CloudWatch alarms. They apply only on the incoming metrics after the metric filter was created. Example filters:

  • find a specific IP in the logs
  • count occurrences of “ERROR” in the logs

Cloud Watch Logs Insights can be used to query(Sql) logs and add queries to CloudWatch Dashboards

Subscription Filter

  • To stream logs in real-time, apply a Subscription Filter on logs
  • Logs can take up to 12 hours to become available for exporting to S3 (not real-time)
  • To store logs in real time in S3, use a subscription filter to publish logs to KDF in real time which will then write the logs to S3.
  • Logs from multiple accounts and regions can be aggregated using subscription filters

image

Alarms

  • Alarms are used to trigger notifications for any metric
  • Various options (sampling, %, max, min, etc…)
  • Alarm States: OK, INSUFFICIENT_DATA, ALARM
  • Alarm Targets:
    • Stop, Terminate, Reboot, or Recover an EC2 Instance
    • Trigger Auto Scaling Action
    • Send notification to SNS

CloudWatch Insights and Operational Visibility

  • CloudWatch Container Insights
    • Collect, aggregate, summarize metrics and logs from containers
    • In Amazon EKS and Kubernetes, CloudWatch Insights is using a containerized version of the CloudWatch Agent to discover containers
  • CloudWatch Lambda Insights
    • Monitoring and troubleshooting solution for serverless applications running on AWS Lambda
    • Collects, aggregates, and summarizes system level metrics including CPU time, memory, disk, and network
  • CloudWatch Contributors Insights
    • Find “Top-N” Contributors through CloudWatch Logs
  • CloudWatch Application Insights
    • Automatic dashboard to troubleshoot your application and related AWS services

CloudTrail

  • Provides governance, compliance and audit for your AWS Account
  • CloudTrail is enabled by default!
  • Get an history of events / API calls made within your AWS Account
  • Can put logs from CloudTrail into CloudWatch Logs or S3
  • A trail can be applied to All Regions (default) or a single Region
  • If a resource is deleted in AWS, investigate CloudTrail first
  • Event retention: 90 days
  • To keep events beyond this period, log them to S3 and use Athena

CloudTrail Events

  • Management Events

    • Events of operations that modify AWS resources :
      • Creating a new IAM user
      • Deleting a subnet
    • Enabled by default
    • Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
  • Data Events

    • By default, data events are not logged (because high volume operations)
    • Events of operations that modify data:
      • Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject)
      • AWS Lambda function execution activity (the Invoke API)
  • CloudTrail Insights Events

    • Enable CloudTrail Insights to detect unusual activity in your account

      • inaccurate resource provisioning
      • hitting service limits
      • Bursts of AWS IAM actions
    • CloudTrail Insights analyzes normal management events to create a baseline and then continuously analyzes write events to detect unusual patterns. If that happens, CloudTrail generates insight events that

      • show anomalies in the Cloud Trail console
      • can can be logged to S3
      • can trigger an EventBridge event for automation

Config

  • Helps with auditing and recording compliance of your AWS resources

  • Record configurations changes over time

  • Evaluate compliance of resources using config rules

  • Does not prevent non-compliant actions from happening (no deny)

  • Questions that can be solved by AWS Config:

    • Is there unrestricted SSH access to my security groups?
    • Do my buckets have any public access?
    • How has my ALB configuration changed over time?
  • You can receive alerts (SNS notifications) for any changes

  • Remediation

    • automate remediation of non-compliant resources using SSM Automation Documents
      • AWS-Managed Automation Documents
      • Custom Automation Documents to invoke a Lambda function for automation
    • You can set Remediation Retries if the resource is still non-compliant after auto remediation

CloudWatch vs CloudTrail vs Config

  • CloudWatch

    • Performance monitoring (metrics, CPU, network, etc…) & dashboards
    • Events & Alerting
    • Log Aggregation & Analysis
  • CloudTrail

    • Record API calls made within your Account by everyone
    • Can define trails for specific resources
    • Global Service
  • Config

    • Record configuration changes
    • Evaluate resources against compliance rules
    • Get timeline of changes and compliance

Trusted Advisor

  • Analyze your AWS accounts and provides recommendation:
    • Cost Optimization
      • low utilization EC2 instances, EBS volumes, idle load balancers, etc.
      • Reserved instances & savings plans optimizations
    • Performance
      • High utilization EC2 instances, CloudFront CDN optimizations
      • EC2 to EBS throughput optimizations, Alias records recommendations
    • Security
      • MFA enabled on Root Account, IAM key rotation, exposed Access Keys
      • S3 Bucket Permissions for public access, security groups with unrestricted ports
    • Fault Tolerence
      • EBS snapshots age, Availability Zone Balance
      • ASG Multi-AZ, RDS Multi-AZ, ELB configuration, etc
    • Service limit
      • whether or not you are reaching the service limit for a service and suggest you to increase the limit beforehand

Cost Explorer

  • Visualize, understand, and manage your AWS costs and usage over time
  • Create custom reports that analyze cost and usage data.
  • Analyze your data at a high level: total costs and usage across all accounts
  • Forecast usage up to 12 months based on previous usage

Access Management

Identity Access Management

  • Groups are collections of users and have policies attached to them
  • User can belong to multiple groups
  • You should log in as an IAM user with admin access even if you have root access.

Policies

  • Policies are JSON documents that outline permissions for users, groups or roles

  • Two types:

    • User based policies
      • IAM policies define which API calls should be allowed for a specific user
    • Resource based policies
      • Control access to an AWS resource
      • Grant the specified principal permission to perform actions on the resource and define under what conditions this applies
  • An IAM principal can access a resource if the user policy ALLOWS it OR the resource policy ALLOWS it AND there’s no explicit DENY.

Roles

  • Some AWS service will need to perform actions on your behalf
  • To do so, we will assign permissions to AWS services with IAM Roles

Reporting Tools

  • Credentials Report : lists all the users and the status of their credentials (MFA, password rotation, etc.)
  • Access Advisor : shows the service permissions granted to a user and when those services were last accessed

Assume Role vs Resource-based Policy

  • When you assume an IAM Role, you give up your original permissions and take the permissions assigned to the role
  • When using a resource based policy, the principal doesn’t have to give up their permissions
  • Kinesis data stream use IAM role
  • SNS, SQS, Lambda, CloudWatch log, API Gateway ... are Ressource based policy

Permission Boundaries

  • Set the maximum permissions an IAM entity can get
  • Can be applied to users and roles (not groups)
  • Used to ensure some users can’t escalate their privileges (make themselves admin)

AWS Organizations

  • Global service
  • Manage multiple AWS accounts under an organization
    • The main account is the management account
    • Other accounts are member accounts
  • An AWS account can only be part of one organization
  • Consolidated Billing across all accounts (lower cost)
  • Pricing benefits from aggregated usage of AWS resources
  • API to automate AWS account creation (on demand account creation)
  • Establish Cross Account Roles for Admin purposes where the master account can assume an admin role in any of the children accounts

Organizational Units (OU)

  • Folders for grouping AWS accounts of an organization
  • Can be nested

image

Service Control Policies (SCP)

  • IAM policies applied to OU or Accounts to restrict Users and Roles
  • They do not apply to the management account (full admin power)
  • Must have an explicit allow (does not allow anything by default – like IAM)
  • Explicit Deny has the highest precedence

Sharing Resources with AWS RAM

  • AWS Resource Access Manager (RAM) is a free service that allows you to share resources with other accounts AWS and within your organization
  • AWS RAM allow you to easily share resources rather than having to create duplicate copies in your different accounts.
  • Resources are :
    • VPC subnets
    • Transit Gateway
    • Route 53 Resolver
    • Licence Manager
    • Dedicated Host
    • etc ...

RAM vs. VPC Peering

  • When should you use VPC peering or RAM?
  • Are you sharing resources within the same region? Use RAM.
  • Are you sharing across regions? Use VPC peering.
  • If RAM isn’t available and VPC peering is, that’s still a great option!

Cross Account Role Access

  • As the number of AWS accounts you manage increases, you will need to set up cross-account access.
  • Duplicating IAM accounts creates a security vulnerability.
  • Cross-account role access gives you the ability to set up temporary access(assume role) you can easily control.

Capture d’écran 2023-04-05 à 06 39 12

SSO

  • For Single Sign-On called now IAM Identity Center

  • One login (single sign-on) for all your

    • AWS accounts in AWS Organizations
    • Business cloud applications (e.g., Salesforce, Box, Microsoft 365, …)
    • SAML2.0-enabled applications
  • Identity providers

    • Built-in identity store in IAM Identity Center
    • 3rd party: Active Directory (AD), OneLogin, Okta

Cognito

Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile apps quickly and easily. Amazon Cognito scales to millions of users and supports sign-in with social identity providers, such as Apple, Facebook, Google, and Amazon, and enterprise identity providers via SAML 2.0 and OpenID Connect.

Cognito User Pools (CUP)

  • Users pools are directories of users that provide sign-up and sign-in options for your application users
  • Create a serverless database of user for your web & mobile apps
  • Integrate with API Gateway & Application Load Balancer
  • Multi-factor authentication (MFA)
  • Federated Identities: users from Facebook, Google, SAML…

Cognito Identity Pools (Federated Identity)

  • Provide AWS credentials to users so they can access AWS resources directly
  • Provides temporary credentials (using STS) to users so they can access AWS resources
  • Integrate with Cognito User Pools as an identity provider
  • Example use case: provide temporary access to write to an S3 bucket after authenticating the user via FaceBook (using CUP identity federation)

Cognito vs IAM: “hundreds of users”, ”mobile users”, “authenticate with SAML”

Capture d’écran 2023-04-04 à 07 51 00

AWS Directory Services

Managed Microsoft AD

  • This is the entire AD suite
  • You can easily build out AD in AWS
  • Login credentials are shared between on-premise and AWS managed AD
  • Manage users on both AD (on-premise and on AWS managed AD)
  • Establish “trust” connections with your on- premises AD
  • Supports MFA

AD Connector

  • Creates a tunel between AWS and your on premises AD
  • Directory Gateway (proxy) to redirect to on- premises AD, supports MFA
  • Users are managed on the on-premises AD

Simple AD

  • provides a subset of the features offered by AWS Managed Microsoft AD, including the ability to manage user accounts and group memberships, create and apply group policies, securely connect to Amazon EC2 instances, and provide Kerberos-based single sign-on (SSO).
  • Standalone directory powered by Linux Samba Active Directory-compatible server
  • AD-compatible managed directory on AWS
  • Cannot be joined with on-premises AD

AWS Control Tower

  • Easy way to set up and govern a secure and compliant multi-account AWS environment based on best practices
  • AWS Control Tower uses AWS Organizations to create accounts
  • Benefits:
    • Automate the set up of your environment in a few clicks
    • Automate ongoing policy management using guardrails
    • Detect policy violations and remediate them
    • Monitor compliance through an interactive dashboard
  • Guardrails
    • Provides ongoing governance for your Control Tower environment (AWS Accounts)
    • Preventive Guardrail
      • Ensures accounts maintain governance by disallowing violating actions
      • Leverages service control policies
      • using SCPs (e.g., Restrict Regions across all your accounts)
    • Detective Guardrail
      • Detects and alerts on noncompliant resources within all accounts
      • Leverages AWS Config rules – Using AWS Config (e.g., identify untagged resources)

Features and Terms to Know

  • Landing zone: Well-architected, multi-account environment based on compliance and security best practices
  • Guardrails: High-level rules providing continuous governance for the AWS environment
  • Account Factory: Configurable account template for standardizing pre-approved configs of new accounts
  • CloudFormation StackSet: Automated deployments of templates deploying repeated resources for governance
  • Shared accounts: Three accounts used by Control Tower created during landing zone creation

Parameters & Encryption

Key Management Service

  • Anytime you hear “encryption” for an AWS service, it’s most likely KMS
  • Regional service (keys are bound to a region)
  • AWS manages encryption keys for us
  • Provides encryption and decryption of data and manages keys required for it
  • Encrypted secrets can be stored in the code or environment variables
  • Encrypt up to 4KB of data per call (if data > 4 KB, use envelope encryption)
  • Integrated with lAM for authorization
  • Audit key usage with CloudTrail(to know who made call to KMS API)
  • Need to set IAM Policy & Key Policy to allow a user or role to access a KM

KMS Keys

  • KMS Keys is the new name of KMS Customer Master Key
  • Symmetric (AES-256 keys)
    • Single encryption key that is used to Encrypt and Decrypt
    • AWS services that are integrated with KMS use Symmetric CMKs
    • You never get access to the KMS Key unencrypted (must call KMS API to use)
  • Asymmetric (RSA & ECC key pairs)
    • Public (Encrypt) and Private Key (Decrypt) pair
    • Used for Encrypt/Decrypt, or Sign/Verify operations
    • The public key is downloadable, but you can’t access the Private Key unencrypted
    • Use case: encryption outside of AWS by users who can’t call the KMS API

Three types of KMS Keys

  • AWS Owned Keys (free): SSE-S3, SSE-SQS, SSE-DDB (default key)
  • AWS Managed Key: free (aws/service-name, example: aws/rds or aws/ebs)
  • Customer managed keys created in KMS: $1 / month
  • Customer managed keys imported (must be symmetric key): $1 / month + pay for API call to KMS ($0.03 / 10000 calls)

Key Rotation

  • Automatic

    • AWS-managed KMS Key
      • automatic every 1 year
    • Customer-managed KMS Key
      • must be enabled
      • automatic every 1 year
  • Manual

    • Imported KMS Key
      • only manual rotation possible using alias

Key Policies

  • Control access to KMS keys, “similar” to S3 bucket policies
  • Cannot access KMS keys without a key policy
  • Default Key Policy
    • Created if you don’t provide a specific Key Policy
    • The default allow every one in your account to access the key
  • Custom KMS Key Policy
    • Define users, roles that can access the KMS key
    • Define who can administer the key
    • Useful for cross-account access of your KMS key

Cross-region Encrypted Snapshot Migration

  • Copy the snapshot to another region with re-encryption option using a new key in the new region (keys are bound to a region)

Cross-account Encrypted Snapshot Migration

  • Create a Snapshot, encrypted with your own KMS Key (Customer Managed Key)
  • Attach a KMS Key Policy to authorize cross-account access
  • Share the encrypted snapshot
  • (in target) Create a copy of the Snapshot, encrypt it with a new CMK in your account
  • Create a volume from the snapshot

KMS Multi-Region Keys

  • Identical KMS keys in different AWS Regions that can be used interchangeably
  • Multi-Region keys have the same key ID, key material, automatic rotation
  • Encrypt in one Region and decrypt in other Regions
  • No need to re-encrypt or making cross-Region API calls
  • KMS Multi-Region are NOT global (Primary + Replicas)
  • Each Multi-Region key is managed independently
  • Use cases: global client-side encryption, encryption on Global DynamoDB, Global Aurora

AMI Sharing Process Encrypted via KMS

  • AMI in Source Account is encrypted with KMS Key from Source Account
  • Must modify the image attribute to add a Launch Permission which corresponds to the specified target AWS account
  • Must share the KMS Keys used to encrypted the snapshot the AMI references with the target account / IAM Role
  • The IAM Role/User in the target account must have the permissions to DescribeKey, ReEncrypted, CreateGrant, Decrypt
  • When launching an EC2 instance from the AMI, optionally the target account can specify a new KMS key in its own account to re-encrypt the volumes

CloudHSM

  • A hardware security module(HSM)is a physical computing device that safeguards and manages digital keys and performs encryption and decryption functions.
  • An HSM contains one or more secure cryptoprocessor chips
  • AWS provisions dedicated encryption hardware (Hardware Security Module)
  • Use when you want to manage encryption keys completely
  • HSM device is stored in AWS (tamper resistant, FIPS 140-2 Level 3 compliance)
  • Supports both symmetric and asymmetric encryption
  • Good option to use with SSE-C encryption
  • CloudHSM clusters are spread across Multi AZ (high availability)
  • IAM permissions are required to perform CRUD operations on HSM cluster
  • CloudHSM Software is used to manage the keys and users (in KMS, everything is managed using IAM)

SSM Parameter Store

  • Secure storage for configuration and secrets
  • Optional Seamless Encryption using KMS
  • Serverless, scalable, durable, easy SDK
  • Security through IAM
  • Notifications with Amazon EventBridge
  • Integration with CloudFormation
  • Difference with secret manager:
    • SSM Parameter store is free, secret manager is not
    • Limit to the number of parameters you can store(10000)
    • No key rotation

Parameter Tiers

Capture d’écran 2023-03-23 à 07 38 54

Parameter Policies

  • Only supported in advanced tie
  • Assign policies to a parameter for additional features
    • Expire the parameter after some time (TTL)
    • Parameter expiration notification
    • Parameter change notification

Secrets Manager

  • Newer service, meant for storing secrets
  • Capability to force rotation of secrets every X days(not available in Parameter Store)
  • Automate generation of secrets on rotation (uses Lambda)
  • Secrets are encrypted using KMS
  • Mostly used for RDS(MySQL, PostgreSQL, Aurora) authentication
    • need to specify the username and password to access the database
    • link the secret to the database to allow for automatic rotation of database login info

Secrets Manager – Multi-Region Secrets

  • Replicate Secrets across multiple AWS Regions
  • Secrets Manager keeps read replicas in sync with the primary Secret
  • Ability to promote a read replica Secret to a standalone Secret
  • Use cases: multi-region apps, disaster recovery strategies, multi-region DB

Certificate Manager

  • Easily provision, manage, and deploy TLS Certificates
  • Used to provide in-flight encryption for websites (HTTPS)
  • Supports both public and private TLS certificates
  • Free of charge for public TLS certificates
  • Automatic TLS certificate renewal
  • load TLS certificates on
    • Elastic Load Balancers (CLB, ALB, NLB)
    • CloudFront Distributions
    • APIs on API Gateway
    • Cannot use ACM with EC2

Cloud Security

Web Application Firewall

  • Protects your application from common layer 7 web exploits such as SQL Injection and Cross-Site Scripting (XSS)
  • Layer 7 is HTTP (vs Layer 4 is TCP/UDP)
  • Can only be deployed on
    • Application Load Balancer
    • API Gateway
    • CloudFront
    • AppSync GraphQL API
    • Cognito User Pool
  • WAF contains Web ACL(Access Control List) containing rules to filter requests based on:
    • IP addresses
    • HTTP headers
    • HTTP body
    • URI strings
    • Size constraints (ex. max 5kb)
    • Geo-match (block countries)
    • Rate-based rules (to count occurrences of events per IP) for DDoS protection
  • Web ACL are Regional except for CloudFront

AWS Shield

  • DDoS: Distributed Denial of Service – many requests at the same time
  • AWS Shield Standard
    • Free DDOS protection service that is activated for every AWS customer
    • Provides protection from attacks such a
      • SYN/UDP Floods
      • Reflection attacks
      • and other layer 3/layer 4 attacks
  • AWS Shield Advanced
    • DDoS mitigation service ($3,000 per month per organization)
    • Protect against more sophisticated attack on
      • EC2 instances
      • Elastic Load Balancing (ELB)
      • CloudFront
      • Global Accelerator
      • Route 53
    • 24/7 access to AWS DDoS Response (DRP) team

Firewall Manager

  • Manage all the firewall rules in all accounts of an AWS Organization
  • Security policy: common set of security rules
    • WAF rules (Application Load Balancer, API Gateways, CloudFront)
    • AWS Shield Advanced (ALB, CLB, NLB, Elastic IP, CloudFront)
    • Security Groups for EC2, Application Load BAlancer and ENI resources in VPC
    • AWS Network Firewall (VPC Level)
    • Amazon Route 53 Resolver DNS Firewall
    • Policies are created at the region level

Security Hub

  • Security Hub is a service provided by Amazon Web Services (AWS) that gives users a comprehensive view of their security posture across their AWS accounts.
  • It provides a centralized dashboard that aggregates and prioritizes security findings from various AWS services such as
    • Amazon GuardDuty
    • AWS Config
    • AWS Inspector, and others

GuardDuty

  • Intelligent Threat discovery to protect your AWS Account
  • Uses Machine Learning algorithms, anomaly detection, 3rd party data
  • No management required (just enable)
  • Input data includes:
    • CloudTrail Logs (unusual API calls, unauthorized deployments)
    • VPC Flow Logs (unusual internal traffic, unusual IP address)
    • DNS Logs (compromised EC2 instances sending encoded data within DNS queries)
    • EKS Audit Logs (suspicious activities and potential EKS cluster compromises)
  • Can setup EventBridge rules to be notified in case of findings
  • EventBridge rules can target AWS Lambda or SNS
  • Can protect against CryptoCurrency attacks (has a dedicated “finding” for it)

image

Inspector

  • Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS

  • Automated Security Assessments

  • For :

    • EC2 instances using System Manager (SSM) Agent running on EC2 instances
    • Amazon ECR - Assessment of containers as they are pushed to ECR
    • Lambda Functions Identifies software vulnerabilities in function code and package dependencies
    • 2 Types of assessment
      • Network Assessments
        • Network configuration analysis to checks for ports reachable from outsive the VPC
        • Inspector agent is not required
      • Host Assessments
        • Vulnerability software(CVE), host hardening(CIS benchmarks), and security best practices
        • Inspector agent is required
    • Amazon Inspector is a vulnerability management service that continuously scans your AWS workloads for vulnerabilities. It is not an intrusion detection service.
  • Integration with AWS Security Hub

  • Send findings to Amazon Event Bridge

  • Gives a risk score associated with all vulnerabilities for prioritization

Macie

  • Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS(ex in an S3 bucket).
  • Macie helps identify and alert you to sensitive data, such as personally identifiable information (PII)
  • Notifies through an EventBridge event

Network Firewall

  • Protect your entire Amazon VPC
  • From Layer 3 to Layer 7 protection
  • Any direction, you can inspect
    • VPC to VPC traffic
    • Outbound to internet
    • Inbound from internet
    • To / from Direct Connect & Site-to-Site VPN
  • Internally, the AWS Network Firewall uses the AWS Gateway Load Balancer
  • Rules can be centrally managed cross- account by AWS Firewall Manager to apply to many VPCs

HPC

High Performance Computing

  • Cloud is perfect for HPC

  • Cluster placement group for low latency inter-nodal communication

  • EC2 Enhanced Networking (SR-IOV)

    • Elastic Network Adapter (ENA)
      • Supported in both Linux & Windows
    • Elastic Fabric Adapter (EFA)
      • Enhanced for HPC
      • Supported in Linux only
      • Leverages Message Passing Interface (MPI) standard
      • Bypasses the underlying Linux OS to provide low-latency networking
  • Automation and Orchestration

    • AWS Batch
      • Used to run single jobs that span multiple EC2 instances (multi-node)
    • AWS Parallel Cluster
      • Open-source cluster management tool to deploy HPC on AWS
      • Configure with text files
      • Automate creation of VPC, Subnet, cluster type and instance types
      • Ability to enable EFA on the cluster

About