AWS certified developer associate hints. RDS, Aurora, ElasiCache, DynamoDB, SQS, SNS, Kinesis (part II)

RDS, Aurora and ElastiCache

Amazon RDS

Possible to use next engines:

Postgres
MySQL
MariaDB
Oracle
Microsoft SQL Server

RDS proses:

Automated provisioning OS patching
Continuous back-ups (1 -35 days)
Monitoring dashboard
Read replicas
Multi AZ for Disaster Recovery
Maintenance windows for upgrades
Scaling
Storage backed by EBS
Free storage increase using max storage thresholds
No SSH access

RDS Read Replicas

Up to 5 Read replicas
Within AZ, Cross AZ or Cross Region
Replication is ASYNC, eventually consistent
Replicas can be promoted to their own DB
App must update connection string to leverage read replicas

RDS Replicas Networking Costs

In AWS there is a network cost when data goes from one AZ to another
For RDS Read Replicas within the same region, you don’t pay the fee

RDS Multi AZ (Disaster Recovery)

Sync replication to stand by
One DNS name - automatic app failover to standby
Increase availability
Failover in case of loss of AZ or network instance or storage failure
No manual intervention in apps
Not used for scaling
Read replicas can be setup as Multi AZ for Disaster Recovery

RDS Enabling Multi AZ

Zero downtime operation (no need to stop DB)
Just click on “modify” for the database

Amazon Aurora

Owned by Amazon
Global aurora enable multi Region mode
Compatible with Postgres and MySQL
Aurora 5x performance improvement over MySQL and 3x over Postgres
Grows in increment of 10 GB to 128 TB
Can have up to 15 replicas, MySQL has 5, process is faster
Failover in Aurora is instantaneous
Aurora costs more than RDS (20%)

Aurora High Availability and Read Scaling

6 copies of your data across 3 AZ
4 copies out of 6 needed for writes
3 copies out of 6 needed for reads
Self healing with peer-to-peer replication
Storage is striped across 100s of volumes
One Aurora instance takes writes
Auto failover for master in less than 30 sec
Master + up to 15 Aurora Read Replicas
Support for Cross Region Replication

Aurora DB Cluster

Aurora provides writer endpoint, if master fails, application still talks to the writer endpoint and requests are redirected.

Auto scaling can be enabled, aurora provide reader endpoint.

Core features:

Automatic failover
Backup and recovery
Isolation and security
Industry compliance
Push button scaling
Automated patching with no down time
Advanced monitoring
Routine maintenance
Backtrack: restore data at any point of time

RDS & Aurora Security

At-rest encryption:

Database master & replicas encrypted using AWS KMS - must be defined at launch time
If the master is not encrypted, the read replicas cannot be encrypted
To encrypt an un-encrypted database, go through a DB snapshot & restore as encrypted

In-flight encryption: TLS-ready.

IAM authentication: IAM roles to connect.

Security Groups: block IP ranges.

No SSH available.

Audit logs can be enabled and sent to CloudWatch logs for longer retention.

RDS Proxy

Fully managed database proxy for RDS
Allows apps to poll and share DB connections
Improve database efficiency
Serverless, autoscaling, highly available (multi AZ)
Reduce RDS & Aurora failover time by up to 66%
Support Mysql PostreSQL, MariaDB, MS SQL Server, Aurora
No code changes
Enforce IAM auth for DB
Never publicly accesible

ElastiCache

Possible to use Redis or Memcached
Caches are in-memory database
Helps to reduce load of database
Helps make your application stateless
AWS takes care of OS maintenance, patching, optimization, monitoring and failure recovery
Application code must be changed

Redis vs Memcached

Redis:

Primary endpoint and readers endpoint
Multi AZ with auto-failover
Read replicas to scale reads and high availability (up to 5)
Data durability using AOF persistence
Backup and restore
Support sets and stored sets

Memcached:

Multi node for partition of data (sharding)
No high availability(replication)
Non persistent
No backup and restore
Multi thread architecture

ElastiCache Strategies

Lazy loading/ Cache-aside/ Lazy population:

First ask cache
If not found, ask DB
Write the data to the cache

Write Through:

Asks cache
Write to DB, write to cache
Cache churn - some data will never be read

Cache Eviction and TIme-to-live.

Cache eviction can occur in three ways:

You delete the item explicitly in the cache
Memory of the cache is overloaded (LRU)
TTL

Amazon Memory DB for Redis

Redis-compatible, durable, in-memory data service
Ultra-fast performance with over 160 millions requests per second
Durable in-memory data storage with Multi-AZ transactional log
Scale seamlessly from 10GB to 100s TBs of storage
Use cases: web and mobile apps, online gaming, media

AWS Dynamo DB

Fully managed, highly available with replication across multiple AZs
NoSQL database - not a relational database
Scales to massive workloads, distributed database
Millions of requests per second, trillions of row, 100s of TB storage
Fast and consistent performance
Integrated with IAM
Enables event driven programming with DynamoDB Stream
Low cost and auto-scaling capabilities
Standard & Infrequent Access Table classes
Maximum size of row is 400kB

Dynamo DB - Primary Keys

Option 1: Partition Key (HASH):

Partition key must be unique for each item
Partition key must be diverse so that the data is distributed
Example “user_id” for users table

Option 2: Partition Key + Sort Key (HASH + RANGE):

The combination must be unique for each item
Data is grouped by partition key

Dynamo DB - Read/Write Capacity

Control how you manage your table’s capacity (read/write throughput).

Provisioned Mode (default):

You specify the number of reads/writes per second
You need to plan capacity beforehand
Pay for provisioned read & write capacity units

On-Demand Mode:

Read/writes automatically scale up/down with your workloads
No capacity planning needed
Pay for what use, more expensive

R/W Capacity Modes Provisioned:

Table must have provisioned read and write capacity units
Read capacity units (RCU) = throughput for read
Write capacity units (WCU) = throughput for write
Option to setup auto-scaling of throughput to meet demand
Throughput can be exceed temporarily using “Burst”
If burst capacity has been consumed you will get a “ProvisionedThroughputExceededException”

Write Capacity units WCU:

One write capacity unit represent one write per second for an item up to 1 KB size
If the item are larger than 1KB, more WCUs are consumed

Read Capacity units RCU:

Strongly consistent read vs Eventually consistent Read:

One Read capacity unit RCU represent one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4Kb in size
If the item are larger than 4KB, more RCUs are consumed

DynamoDB - Partitions Internal

Data is stored in partitions
Partition key go through a hashing algorithm to know to which partition they go to
WCU and RCU are spread evenly across partitions
If we exceed provisioned RCUs or WCUs, we get “ProvisionedThroughputExceededException”, because of “hot” partition keys or very large items

Solving issues:

Exponential backoff
Distribute partition keys as much as possible
If RCU issues, we can use DynamoDB Accelerator DAX

On-Demand Mode

Read/writes automatically scale up/down with your workload
No capacity planning needed
Unlimited WCU and RCU, no throttle, more expensive
You charged for reads/writes that you use in terms of RRU and WRU
Read request units (RRU)- throughput for reads
Write request units (WRU)- throughput write
2.5x more expensive that provisioned capacity

DynamoDB basic operations

Modification.

PutItem:

create new item or fully replace an old one
Consume WCU

UpdateItem:

edit existing attributes or adds a new item
Partial update

ConditionalWrite:

accept a write/update/delete only if conditions are met

Retrieval.

GetItem:

read based of primary key
Primary key can be HASH or HASH+RANGE
eventually consistent default
strongly consistent more RCU
ProjectionExpression can be specified to retrieve only certain attributes

Query:

KeyConditionExpression (= or > or <)
FilterExpression (additional filtering after query)
Returns, number of items specified in limit, or up to 1MB of data

Scan:

Read entire table
Return 1MB of data - use pagination to keep on reading
Consume a lot of RCU
Limit impact using Limit or reduce the size of the result and pause
For faster performance use parallel scan
Possible to use ProjectionExpression & FilterExpression

DeleteItem:

Delete individual item
Ability to perform a conditional delete

DeleteTable:

Delete whole table
Much quicker deletion

DynamoDB - Batch operation

Allows you save in latency by reducing the number of API calls
Operations are done in parallel for better efficiency
Part of the batch can fail

BatchWriteItem:

Up to 25 PutItem and or DeleteItem in one call
Up to 16MB of data written, up to 400Kb of data per item
Can’t update items
UnprocessedItems for failed write operations

BatchGetItem:

Return items for one or more tables
Up to 100 items, up to 16MB of data
Items are retrieved in parallel to minimize latency
UnprocessedItems for failed write operations

DynamoDB Conditional Writes:

Possible to PutItem, UpdateItem, DeleteItem and BatchWriteItem-
Conditions “attribute_exists”, “attribute_not_exists”, “attribute_type”, “contains”, “size”

DynamoDB - Indexes

Local Secondary Index (LSI):

Alternative sort key for your table (same Partition Key as that of base table)
The sort key consist of one scalar attribute (string, number or binary)
Up to 5 local secondary indexes per table
Must be defined at table creation time
Attribute projections can contain some or all the attributes of the base table

Global Secondary Index (GSI):

Alternative primary key (hash or hash + range) from the base table
Speed up queries on non-key attributes
The index key consists of scalar attributes (string, number, binary)
Attribute projections some or all the attributes of the base table
Must provision RCUs & WCUs for the index
Can be added/modified after table creation

Index and Throttling

Global secondary index:

If the writes are throttled on the GSI, then the main table will be throttled
Even if the WCU on the main tables are fine
Chose your GSI partition key carefully
Assign your WCU capacity carefully

Local secondary index:

uses the WCUs and RCUs of the main table
No special throttling considerations

DynamoDB - PartiQL

Support some but not all statements.

INSERT, UPDATE, SELECT, DELETE.

Also support batch operations.

DynamoDB - Optimistic Locking

DynamoDB has a feature called Conditional Writes
A strategy to ensure an item hasn’t changed before you update/delete it
Each item has an attribute that acts as a version number

DynamoDB Accelerator DAX

Fully managed highly available, seamless in-memory cache for DynamoDB
Microseconds latency for cached reads & queries
Doesn’t require application logic modification
Solve the Hot Key problem
Default TTL is 5 minutes
Up to 10 nodes in the cluster
Multi-AZ (3 nodes minimum recommended for production)
Secure (encryption, KMS, Roles and IAM)
For aggregation possible to use ElasticCache

DynamoDB Stream

Ordered steam of item-level modifications (create/update/delete) in a table
Stream records can be send to kinesis data stream, read by AWS Lambda, read by Kinesis client library application
Data retention for up to 24 hours

Possible to choose the information that will be written to the stream:

KEYS_ONLY - only the key attributes of the modified item
NEW_IMAGE - the entire item, as it appears after it was modified
OLD_IMAGE - the entire item, as it appeared before it was changed
NEW_AND_OLD_IMAGES - both new and old item

DynamoDB streams are made of shard, just like Kinesis Data Stream.

You don’t provision shards, this is automated by AWS.

Records are not retroactively populated in a stream after enabling it.

Lambda integration

To work with Lambda we need to define an Event Source Mapping to read from DynamoDB Stream.

You need to ensure the Lambda function has the appropriate permissions.

Your Lambda function is invoked synchronously.

DynamoDB Time to Live

Automatically delete items after an expiry timestamp
Doesn’t consume any WCUs
The TTL attribute must be a Number data type with Unix Epoch timestamp value
Expired items deleted within 48 hours of expiration
Expired items, that havn’t been deleted, appears in read/queries/scans (must be filtered out if not needed)
Expired items are deleted from both LSI and GSI
A delete operation for each expired item enters the DynamoDB stream

DynamoDB Transactions

Coordinated, all-or-nothing operation (add/update/delete) to multiple items across one or more table
Provide atomicity, consistency, isolation and durability
Read modes - eventual consistency, strong consistency, transactional
Write modes - standard, transactional
Consume 2x WCU & RCU

Two operations:

TransactGetItems - one or more GetItem operations
TransactWriteItems - one or more PutItem, UpdateItem or DeleteItem

Writing or reading consume 2x of WCU:

3 transactional writes per second item size 5kb => 3 \* (5kb/1kb) \* 2(transactional cost) = 30 WCU

5 transactional reads per second item 5 kb => 5 \* (5kb/4kb) \* 2 (transactional cost) => 5 \* 2(5 gets round to upper 4kb) \* 2 = 20 RCU

DynamoDB Write types

Concurrent writes, value will be overwritten
Conditional Writes, set a condition value for write
Atomic write, both values are adding
Batch writes, updated several items

DynamoDB Security

VPC endpoints available to access DynamoDB without using internet.

Access is fully controlled by IAM.

Encryption at rest using AWS KMS.

Backup and restore with no performance impact, point in time recovery.

Global tables with multi-region, multi-active, fully replicated, high performance.

DynamoDB - fine-grained access control

Using identity federation or cognito identity pools, each user gets AWS credentials
You can assign an IAM Role to the users with a Condition to limit their API access to DynamoDB
LeadingKeys - limit row-level access for users on the Primary Key
Attributes - limit specific attributes the user can see

AWS Messaging, SQS, SNS & Kinesis

AWS SQS - simple queue service

Producer:

Possible to have multiple producers
Possible to have multiple consumers
Unlimited number of messages
Default retention period is 4 days, max retention period is 14 days
Message up to 256 kb
At least once delivery

Consumer:

Consumer delete message from the queue
Message delivered to consumer one time
Poll SQS messages, up to 10 messages
Delete messages using the DeleteMessage API
Best effort message ordering
We can scale by adding more consumers
Possible to scale with ASG based of CloudWatch Metric (ApproximateNumberOfMessages)

Security:

Encryption in flight HTTPS
At-rest encryption using KMS
Client-side encryption if the client wants to perform encryption/decryption
IAM policies
SQS Access policies

SQS queue access policies

Cross account access
Publish event notifications to SQS queue

SQS Message visibility timeout

By default timeout is 30 sec
If message is not processed within visibility timeout, its being processed twice
If consumer knows that he need a little bit more time, he can use ChangeMessageVisibility API to get more time

SQS Dead letter queue

If the consumer is not able to process the message within the visibility timeout, the message goes back to the queue
We can set a threshold of how many times a message can go back to the queue
After the MaximumReceives threshold is exceeded, the message goes into a dead letter queue
DLQ of FIFO queue must be a FIFO queue
DLQ of a standard queue must be a standard queue

Redrive to source

Feature to help consume messages in the DLQ to understand what is wrong with them
When our code is fixed, we can retrieve the messages from the DLQ back into the source queue in batches without writing custom code

SQS Delay Queue

Delay messages, consumer don’t see message immediately, up to 15 minutes
Default is 0 seconds
Can set default at queue level
Can override the default on send using the DelaySeconds parameter

SQS Long Polling

When consumer requests messages from the queue, it can optionally “wait” for messages to arrive if there are none in the queue
This is called long polling
Long polling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application
The wait time can be between 1 sec to 20 sec (20 sec preferable)
Long polling is preferable to short polling
Long polling can be enabled on queue level on API call level using RecieveMessageWaitSeconds

SQS Extended client

Message size limit is 256KB, how to send large message ?

Using SQS extended client (Java lib).

Message is stored in s3, meta message is sent to queue.

SQS - API

CreateQueue (MessageRetentionPeriod)
DeleteQueue
PurgeQueue => removes all message from the queue
SendMessage(DelaySeconds),
ReceiveMessage
DeleteMessage
MaxNumberOfMessage, default is 1, max 10
ReceiveMessageWaitTimeSeconds: Long polling
ChangeMessageVisibility: change the message timeout

Batch API possible for SendMessage, DeleteMessage, ChangeMessageVisibility, helps decrease costs.

SQS FIFO Queue

FIFO = first in first out (ordering)
Limited throughput : 300 msg/s without batching of 3000 msg/s with
Exactly once send capability (removing duplicates)
Messages are processed in order by consumer

SQS FIFO - Deduplicatioan:

De-duplication interval is 5 mins
Two de-duplication methods: content based using SHA-256 of message body, explicit message Deduplication ID

SQS FIFO - Message Grouping:

If you specify same value of MessageGroupID in an SQS FIFO queue, you can only have one consumer, and the messages are in order
To get ordering at the level of a subset of messages, specify different values for MessageGroupID
Message that share a common message group id will be in order within the group
Each group id can have a different consumer
Ordering across group is not guaranteed

Amazon SNS - simple notification service

What if you want to send one message to many receivers
SNS using pub/sub pattern
The event producer only sends message to one SNS topic
As many event receivers(subscriptions) as we want to listen to the SNS topic notifications
Each subscriber to the topic will get all the messages (unless you use new feature to filter messages)
Up to 12 500 500 subscribers per topic
100 000 topic limit

SNS can receive from:

CloudWatch alarms
ASG
CloudFormation
AWS budgets
S3
AWS DMS
Lambda
DynamoDB
RDS events

SNS has two ways to publish:

Topic publish (using SDK):

Create topic
Create subscription
Publish to the topic

Direct Publish (for mobile apps SDK):

Create a platform application
Create a platform endpoint
Publish to the platform endpoint

Amazon SNS Security:

In-flight encryption HTTPS API
At-rest encryption using KMS
Client-side encryption
IAM policies
SNS access policies

Amazon SNS and SQS Fan out pattern

Push one message in SNS, receive in all SQS queues that are subscribed
Full decoupled, no data loss
SQS allows for: data persistence, delayed processing and retries of work
Ability to add more SQS subscribers over time
Make sure your SQS queue access policy allows SNS to write
Cross region delivery
Scale from one event to multiple subscribers
Allow to publish in Kineses Firehouse
Possible to have FIFO topics

Message filtering:

JSON policy used to filter messages sent to SNS topics subscriptions
If a subscription doesn’t have a filter policy, it receive every message

Kinesis

Capture, process and store data streams (Kinesis data stream)
Load data streams into AWS data stores (Kinesis data firehose)
Analyze data streams with SQL or Apache Flink (Kinesis data analytics)
Capture, process, and store video streams (Kinesis Video Streams)

Kinesis data stream

Made of shards
Have to be provisioned ahead of time
Producers: applications, clients, SDK, KPL, Kinesis Agent produce records
Record is made of partition key and data blob (up to 1 MB)
1MB/sec or 1000 msg/sec per shard
Consumers: Apps, lambda, kinesis firehose, kinesis analytics
2MB/sec(shared) per shard all consumers or 2MB/sec(enhanced) per shard per consumer
Retention 1 day to 365 days
Ability to reprocess (replay) data
Once data is inserted, it can’t be deleted
Data that shares the same partition goes to the same shard

Provisioned mode:

You choose the number of shards provisioned, scale manually or using API
Each shard gets 1MB in (or 1000 record per second)
Each shard gets 2MB out (classic or enhanced fan-out consumer)
You pay per shard provisioned per hour

On-demand mode:

No need to provision or manage capacity
Default capacity provisioned (4MB/s in or 4000 records per second)
Scales automatically based on observed throughput peak during last 30 days
Pay per stream per hour & data in/out per GB