AWS certified developer associate hints. RDS, Aurora, ElasiCache, DynamoDB, SQS, SNS, Kinesis (part II)

RDS, Aurora and ElastiCache

Amazon RDS

Possible to use next engines:

  • Postgres
  • MySQL
  • MariaDB
  • Oracle
  • Microsoft SQL Server

RDS proses:

  • Automated provisioning OS patching
  • Continuous back-ups (1 -35 days)
  • Monitoring dashboard
  • Read replicas
  • Multi AZ for Disaster Recovery
  • Maintenance windows for upgrades
  • Scaling
  • Storage backed by EBS
  • Free storage increase using max storage thresholds
  • No SSH access

RDS Read Replicas

  • Up to 5 Read replicas
  • Within AZ, Cross AZ or Cross Region
  • Replication is ASYNC, eventually consistent
  • Replicas can be promoted to their own DB
  • App must update connection string to leverage read replicas

RDS Replicas Networking Costs

  • In AWS there is a network cost when data goes from one AZ to another
  • For RDS Read Replicas within the same region, you don’t pay the fee

RDS Multi AZ (Disaster Recovery)

  • Sync replication to stand by
  • One DNS name - automatic app failover to standby
  • Increase availability
  • Failover in case of loss of AZ or network instance or storage failure
  • No manual intervention in apps
  • Not used for scaling
  • Read replicas can be setup as Multi AZ for Disaster Recovery

RDS Enabling Multi AZ

  • Zero downtime operation (no need to stop DB)
  • Just click on “modify” for the database

Amazon Aurora

  • Owned by Amazon
  • Global aurora enable multi Region mode
  • Compatible with Postgres and MySQL
  • Aurora 5x performance improvement over MySQL and 3x over Postgres
  • Grows in increment of 10 GB to 128 TB
  • Can have up to 15 replicas, MySQL has 5, process is faster
  • Failover in Aurora is instantaneous
  • Aurora costs more than RDS (20%)

Aurora High Availability and Read Scaling

  • 6 copies of your data across 3 AZ
  • 4 copies out of 6 needed for writes
  • 3 copies out of 6 needed for reads
  • Self healing with peer-to-peer replication
  • Storage is striped across 100s of volumes
  • One Aurora instance takes writes
  • Auto failover for master in less than 30 sec
  • Master + up to 15 Aurora Read Replicas
  • Support for Cross Region Replication

Aurora DB Cluster

Aurora provides writer endpoint, if master fails, application still talks to the writer endpoint and requests are redirected.

Auto scaling can be enabled, aurora provide reader endpoint.

Core features:

  • Automatic failover
  • Backup and recovery
  • Isolation and security
  • Industry compliance
  • Push button scaling
  • Automated patching with no down time
  • Advanced monitoring
  • Routine maintenance
  • Backtrack: restore data at any point of time

RDS & Aurora Security

At-rest encryption:

  • Database master & replicas encrypted using AWS KMS - must be defined at launch time
  • If the master is not encrypted, the read replicas cannot be encrypted
  • To encrypt an un-encrypted database, go through a DB snapshot & restore as encrypted

In-flight encryption: TLS-ready.

IAM authentication: IAM roles to connect.

Security Groups: block IP ranges.

No SSH available.

Audit logs can be enabled and sent to CloudWatch logs for longer retention.

RDS Proxy

  • Fully managed database proxy for RDS
  • Allows apps to poll and share DB connections
  • Improve database efficiency
  • Serverless, autoscaling, highly available (multi AZ)
  • Reduce RDS & Aurora failover time by up to 66%
  • Support Mysql PostreSQL, MariaDB, MS SQL Server, Aurora
  • No code changes
  • Enforce IAM auth for DB
  • Never publicly accesible

ElastiCache

  • Possible to use Redis or Memcached
  • Caches are in-memory database
  • Helps to reduce load of database
  • Helps make your application stateless
  • AWS takes care of OS maintenance, patching, optimization, monitoring and failure recovery
  • Application code must be changed

Redis vs Memcached

Redis:

  • Primary endpoint and readers endpoint
  • Multi AZ with auto-failover
  • Read replicas to scale reads and high availability (up to 5)
  • Data durability using AOF persistence
  • Backup and restore
  • Support sets and stored sets

Memcached:

  • Multi node for partition of data (sharding)
  • No high availability(replication)
  • Non persistent
  • No backup and restore
  • Multi thread architecture

ElastiCache Strategies

Lazy loading/ Cache-aside/ Lazy population:

  • First ask cache
  • If not found, ask DB
  • Write the data to the cache

Write Through:

  • Asks cache
  • Write to DB, write to cache
  • Cache churn - some data will never be read

Cache Eviction and TIme-to-live.

Cache eviction can occur in three ways:

  • You delete the item explicitly in the cache
  • Memory of the cache is overloaded (LRU)
  • TTL

Amazon Memory DB for Redis

  • Redis-compatible, durable, in-memory data service
  • Ultra-fast performance with over 160 millions requests per second
  • Durable in-memory data storage with Multi-AZ transactional log
  • Scale seamlessly from 10GB to 100s TBs of storage
  • Use cases: web and mobile apps, online gaming, media

AWS Dynamo DB

  • Fully managed, highly available with replication across multiple AZs
  • NoSQL database - not a relational database
  • Scales to massive workloads, distributed database
  • Millions of requests per second, trillions of row, 100s of TB storage
  • Fast and consistent performance
  • Integrated with IAM
  • Enables event driven programming with DynamoDB Stream
  • Low cost and auto-scaling capabilities
  • Standard & Infrequent Access Table classes
  • Maximum size of row is 400kB

Dynamo DB - Primary Keys

Option 1: Partition Key (HASH):

  • Partition key must be unique for each item
  • Partition key must be diverse so that the data is distributed
  • Example “user_id” for users table

Option 2: Partition Key + Sort Key (HASH + RANGE):

  • The combination must be unique for each item
  • Data is grouped by partition key

Dynamo DB - Read/Write Capacity

Control how you manage your table’s capacity (read/write throughput).

Provisioned Mode (default):

  • You specify the number of reads/writes per second
  • You need to plan capacity beforehand
  • Pay for provisioned read & write capacity units

On-Demand Mode:

  • Read/writes automatically scale up/down with your workloads
  • No capacity planning needed
  • Pay for what use, more expensive

R/W Capacity Modes Provisioned:

  • Table must have provisioned read and write capacity units
  • Read capacity units (RCU) = throughput for read
  • Write capacity units (WCU) = throughput for write
  • Option to setup auto-scaling of throughput to meet demand
  • Throughput can be exceed temporarily using “Burst”
  • If burst capacity has been consumed you will get a “ProvisionedThroughputExceededException”

Write Capacity units WCU:

  • One write capacity unit represent one write per second for an item up to 1 KB size
  • If the item are larger than 1KB, more WCUs are consumed

Read Capacity units RCU:

Strongly consistent read vs Eventually consistent Read:

  • One Read capacity unit RCU represent one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4Kb in size
  • If the item are larger than 4KB, more RCUs are consumed

DynamoDB - Partitions Internal

  • Data is stored in partitions
  • Partition key go through a hashing algorithm to know to which partition they go to
  • WCU and RCU are spread evenly across partitions
  • If we exceed provisioned RCUs or WCUs, we get “ProvisionedThroughputExceededException”, because of “hot” partition keys or very large items

Solving issues:

  • Exponential backoff
  • Distribute partition keys as much as possible
  • If RCU issues, we can use DynamoDB Accelerator DAX

On-Demand Mode

  • Read/writes automatically scale up/down with your workload
  • No capacity planning needed
  • Unlimited WCU and RCU, no throttle, more expensive
  • You charged for reads/writes that you use in terms of RRU and WRU
  • Read request units (RRU)- throughput for reads
  • Write request units (WRU)- throughput write
  • 2.5x more expensive that provisioned capacity

DynamoDB basic operations

Modification.

PutItem:

  • create new item or fully replace an old one
  • Consume WCU

UpdateItem:

  • edit existing attributes or adds a new item
  • Partial update

ConditionalWrite:

  • accept a write/update/delete only if conditions are met

Retrieval.

GetItem:

  • read based of primary key
  • Primary key can be HASH or HASH+RANGE
  • eventually consistent default
  • strongly consistent more RCU
  • ProjectionExpression can be specified to retrieve only certain attributes

Query:

  • KeyConditionExpression (= or > or <)
  • FilterExpression (additional filtering after query)
  • Returns, number of items specified in limit, or up to 1MB of data

Scan:

  • Read entire table
  • Return 1MB of data - use pagination to keep on reading
  • Consume a lot of RCU
  • Limit impact using Limit or reduce the size of the result and pause
  • For faster performance use parallel scan
  • Possible to use ProjectionExpression & FilterExpression

DeleteItem:

  • Delete individual item
  • Ability to perform a conditional delete

DeleteTable:

  • Delete whole table
  • Much quicker deletion

DynamoDB - Batch operation

  • Allows you save in latency by reducing the number of API calls
  • Operations are done in parallel for better efficiency
  • Part of the batch can fail

BatchWriteItem:

  • Up to 25 PutItem and or DeleteItem in one call
  • Up to 16MB of data written, up to 400Kb of data per item
  • Can’t update items
  • UnprocessedItems for failed write operations

BatchGetItem:

  • Return items for one or more tables
  • Up to 100 items, up to 16MB of data
  • Items are retrieved in parallel to minimize latency
  • UnprocessedItems for failed write operations

DynamoDB Conditional Writes:

  • Possible to PutItem, UpdateItem, DeleteItem and BatchWriteItem-
  • Conditions “attribute_exists”, “attribute_not_exists”, “attribute_type”, “contains”, “size”

DynamoDB - Indexes

Local Secondary Index (LSI):

  • Alternative sort key for your table (same Partition Key as that of base table)
  • The sort key consist of one scalar attribute (string, number or binary)
  • Up to 5 local secondary indexes per table
  • Must be defined at table creation time
  • Attribute projections can contain some or all the attributes of the base table

Global Secondary Index (GSI):

  • Alternative primary key (hash or hash + range) from the base table
  • Speed up queries on non-key attributes
  • The index key consists of scalar attributes (string, number, binary)
  • Attribute projections some or all the attributes of the base table
  • Must provision RCUs & WCUs for the index
  • Can be added/modified after table creation

Index and Throttling

Global secondary index:

  • If the writes are throttled on the GSI, then the main table will be throttled
  • Even if the WCU on the main tables are fine
  • Chose your GSI partition key carefully
  • Assign your WCU capacity carefully

Local secondary index:

  • uses the WCUs and RCUs of the main table
  • No special throttling considerations

DynamoDB - PartiQL

Support some but not all statements.

INSERT, UPDATE, SELECT, DELETE.

Also support batch operations.

DynamoDB - Optimistic Locking

  • DynamoDB has a feature called Conditional Writes
  • A strategy to ensure an item hasn’t changed before you update/delete it
  • Each item has an attribute that acts as a version number

DynamoDB Accelerator DAX

  • Fully managed highly available, seamless in-memory cache for DynamoDB
  • Microseconds latency for cached reads & queries
  • Doesn’t require application logic modification
  • Solve the Hot Key problem
  • Default TTL is 5 minutes
  • Up to 10 nodes in the cluster
  • Multi-AZ (3 nodes minimum recommended for production)
  • Secure (encryption, KMS, Roles and IAM)
  • For aggregation possible to use ElasticCache

DynamoDB Stream

  • Ordered steam of item-level modifications (create/update/delete) in a table
  • Stream records can be send to kinesis data stream, read by AWS Lambda, read by Kinesis client library application
  • Data retention for up to 24 hours

Possible to choose the information that will be written to the stream:

  • KEYS_ONLY - only the key attributes of the modified item
  • NEW_IMAGE - the entire item, as it appears after it was modified
  • OLD_IMAGE - the entire item, as it appeared before it was changed
  • NEW_AND_OLD_IMAGES - both new and old item

DynamoDB streams are made of shard, just like Kinesis Data Stream.

You don’t provision shards, this is automated by AWS.

Records are not retroactively populated in a stream after enabling it.

Lambda integration

To work with Lambda we need to define an Event Source Mapping to read from DynamoDB Stream.

You need to ensure the Lambda function has the appropriate permissions.

Your Lambda function is invoked synchronously.

DynamoDB Time to Live

  • Automatically delete items after an expiry timestamp
  • Doesn’t consume any WCUs
  • The TTL attribute must be a Number data type with Unix Epoch timestamp value
  • Expired items deleted within 48 hours of expiration
  • Expired items, that havn’t been deleted, appears in read/queries/scans (must be filtered out if not needed)
  • Expired items are deleted from both LSI and GSI
  • A delete operation for each expired item enters the DynamoDB stream

DynamoDB Transactions

  • Coordinated, all-or-nothing operation (add/update/delete) to multiple items across one or more table
  • Provide atomicity, consistency, isolation and durability
  • Read modes - eventual consistency, strong consistency, transactional
  • Write modes - standard, transactional
  • Consume 2x WCU & RCU

Two operations:

  • TransactGetItems - one or more GetItem operations
  • TransactWriteItems - one or more PutItem, UpdateItem or DeleteItem

Writing or reading consume 2x of WCU:

3 transactional writes per second item size 5kb => 3 \* (5kb/1kb) \* 2(transactional cost) = 30 WCU

5 transactional reads per second item 5 kb => 5 \* (5kb/4kb) \* 2 (transactional cost) => 5 \* 2(5 gets round to upper 4kb) \* 2 = 20 RCU

DynamoDB Write types

  • Concurrent writes, value will be overwritten
  • Conditional Writes, set a condition value for write
  • Atomic write, both values are adding
  • Batch writes, updated several items

DynamoDB Security

VPC endpoints available to access DynamoDB without using internet.

Access is fully controlled by IAM.

Encryption at rest using AWS KMS.

Backup and restore with no performance impact, point in time recovery.

Global tables with multi-region, multi-active, fully replicated, high performance.

DynamoDB - fine-grained access control

  • Using identity federation or cognito identity pools, each user gets AWS credentials
  • You can assign an IAM Role to the users with a Condition to limit their API access to DynamoDB
  • LeadingKeys - limit row-level access for users on the Primary Key
  • Attributes - limit specific attributes the user can see

AWS Messaging, SQS, SNS & Kinesis

AWS SQS - simple queue service

Producer:

  • Possible to have multiple producers
  • Possible to have multiple consumers
  • Unlimited number of messages
  • Default retention period is 4 days, max retention period is 14 days
  • Message up to 256 kb
  • At least once delivery

Consumer:

  • Consumer delete message from the queue
  • Message delivered to consumer one time
  • Poll SQS messages, up to 10 messages
  • Delete messages using the DeleteMessage API
  • Best effort message ordering
  • We can scale by adding more consumers
  • Possible to scale with ASG based of CloudWatch Metric (ApproximateNumberOfMessages)

Security:

  • Encryption in flight HTTPS
  • At-rest encryption using KMS
  • Client-side encryption if the client wants to perform encryption/decryption
  • IAM policies
  • SQS Access policies

SQS queue access policies

  • Cross account access
  • Publish event notifications to SQS queue

SQS Message visibility timeout

  • By default timeout is 30 sec
  • If message is not processed within visibility timeout, its being processed twice
  • If consumer knows that he need a little bit more time, he can use ChangeMessageVisibility API to get more time

SQS Dead letter queue

  • If the consumer is not able to process the message within the visibility timeout, the message goes back to the queue
  • We can set a threshold of how many times a message can go back to the queue
  • After the MaximumReceives threshold is exceeded, the message goes into a dead letter queue
  • DLQ of FIFO queue must be a FIFO queue
  • DLQ of a standard queue must be a standard queue

Redrive to source

  • Feature to help consume messages in the DLQ to understand what is wrong with them
  • When our code is fixed, we can retrieve the messages from the DLQ back into the source queue in batches without writing custom code

SQS Delay Queue

  • Delay messages, consumer don’t see message immediately, up to 15 minutes
  • Default is 0 seconds
  • Can set default at queue level
  • Can override the default on send using the DelaySeconds parameter

SQS Long Polling

  • When consumer requests messages from the queue, it can optionally “wait” for messages to arrive if there are none in the queue
  • This is called long polling
  • Long polling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application
  • The wait time can be between 1 sec to 20 sec (20 sec preferable)
  • Long polling is preferable to short polling
  • Long polling can be enabled on queue level on API call level using RecieveMessageWaitSeconds

SQS Extended client

Message size limit is 256KB, how to send large message ?

Using SQS extended client (Java lib).

Message is stored in s3, meta message is sent to queue.

SQS - API

  • CreateQueue (MessageRetentionPeriod)
  • DeleteQueue
  • PurgeQueue => removes all message from the queue
  • SendMessage(DelaySeconds),
  • ReceiveMessage
  • DeleteMessage
  • MaxNumberOfMessage, default is 1, max 10
  • ReceiveMessageWaitTimeSeconds: Long polling
  • ChangeMessageVisibility: change the message timeout

Batch API possible for SendMessage, DeleteMessage, ChangeMessageVisibility, helps decrease costs.

SQS FIFO Queue

  • FIFO = first in first out (ordering)
  • Limited throughput : 300 msg/s without batching of 3000 msg/s with
  • Exactly once send capability (removing duplicates)
  • Messages are processed in order by consumer

SQS FIFO - Deduplicatioan:

  • De-duplication interval is 5 mins
  • Two de-duplication methods: content based using SHA-256 of message body, explicit message Deduplication ID

SQS FIFO - Message Grouping:

  • If you specify same value of MessageGroupID in an SQS FIFO queue, you can only have one consumer, and the messages are in order
  • To get ordering at the level of a subset of messages, specify different values for MessageGroupID
  • Message that share a common message group id will be in order within the group
  • Each group id can have a different consumer
  • Ordering across group is not guaranteed

Amazon SNS - simple notification service

  • What if you want to send one message to many receivers
  • SNS using pub/sub pattern
  • The event producer only sends message to one SNS topic
  • As many event receivers(subscriptions) as we want to listen to the SNS topic notifications
  • Each subscriber to the topic will get all the messages (unless you use new feature to filter messages)
  • Up to 12 500 500 subscribers per topic
  • 100 000 topic limit

SNS can receive from:

  • CloudWatch alarms
  • ASG
  • CloudFormation
  • AWS budgets
  • S3
  • AWS DMS
  • Lambda
  • DynamoDB
  • RDS events

SNS has two ways to publish:

  1. Topic publish (using SDK):
  • Create topic
  • Create subscription
  • Publish to the topic
  1. Direct Publish (for mobile apps SDK):
  • Create a platform application
  • Create a platform endpoint
  • Publish to the platform endpoint

Amazon SNS Security:

  • In-flight encryption HTTPS API
  • At-rest encryption using KMS
  • Client-side encryption
  • IAM policies
  • SNS access policies

Amazon SNS and SQS Fan out pattern

  • Push one message in SNS, receive in all SQS queues that are subscribed
  • Full decoupled, no data loss
  • SQS allows for: data persistence, delayed processing and retries of work
  • Ability to add more SQS subscribers over time
  • Make sure your SQS queue access policy allows SNS to write
  • Cross region delivery
  • Scale from one event to multiple subscribers
  • Allow to publish in Kineses Firehouse
  • Possible to have FIFO topics

Message filtering:

  • JSON policy used to filter messages sent to SNS topics subscriptions
  • If a subscription doesn’t have a filter policy, it receive every message

Kinesis

  • Capture, process and store data streams (Kinesis data stream)
  • Load data streams into AWS data stores (Kinesis data firehose)
  • Analyze data streams with SQL or Apache Flink (Kinesis data analytics)
  • Capture, process, and store video streams (Kinesis Video Streams)

Kinesis data stream

  • Made of shards
  • Have to be provisioned ahead of time
  • Producers: applications, clients, SDK, KPL, Kinesis Agent produce records
  • Record is made of partition key and data blob (up to 1 MB)
  • 1MB/sec or 1000 msg/sec per shard
  • Consumers: Apps, lambda, kinesis firehose, kinesis analytics
  • 2MB/sec(shared) per shard all consumers or 2MB/sec(enhanced) per shard per consumer
  • Retention 1 day to 365 days
  • Ability to reprocess (replay) data
  • Once data is inserted, it can’t be deleted
  • Data that shares the same partition goes to the same shard

Provisioned mode:

  • You choose the number of shards provisioned, scale manually or using API
  • Each shard gets 1MB in (or 1000 record per second)
  • Each shard gets 2MB out (classic or enhanced fan-out consumer)
  • You pay per shard provisioned per hour

On-demand mode:

  • No need to provision or manage capacity
  • Default capacity provisioned (4MB/s in or 4000 records per second)
  • Scales automatically based on observed throughput peak during last 30 days
  • Pay per stream per hour & data in/out per GB

Security:

  • IAM policies
  • Encryption in flight HTTPS
  • Encryption at rest using KMS
  • You can implement encryption/decryption of data on client side (cline-side)
  • VPC endpoints
  • Cloud trail for all API calls

Kinesis Producers

  • Puts data record into data steams
  • Data record consist of: sequence number (unique per partition key within shard), partition key, data blob (up to 1 MB)
  • Producers: AWS SDK, Kinesis Producer Library KPL (C++, Java, batch, compression, retries), Kinesis Agent
  • Write throughput: 1 MB/sec or 1000 record/sec per shard
  • PutRecord API
  • Use batching with PutRecords API to reduce costs

Kinesis Consumers

  • Get data from records from data streams and process them
  • AWS Lambda, Kinesis Analytics, Kinesis Firehose, Custom consumer (SDK) - classic or Enhanced fan-out, Kinesis client library

Shared (classic) fan out consumer:

  • 2MB/sec per shard across all consumers (pull method)
  • Max 5 GetRecords API call/sec
  • Latency ~ 200 ms
  • Minimize costs
  • Uses GetRecords API
  • Returns up to 10MB or 10 000 records

Enhanced fan-out consumer:

  • 2MB/sec per consumer per shard (push method)
  • Latency 70ms
  • Higher costs
  • Kinesis pushes data to consumers over HTTP/2 (SubscribeToShard API)
  • Soft limit of 5 consumers applications per data stream

Kinesis client library

  • A java library that helps read record from a kinesis data stream with distributed applications sharing the read workload
  • Each shard is to be read by only one KCL instance
  • 4 shards = max 4 KCL instances
  • 6 shard = max 6 KCL instances
  • Progress is checkpoint into DynamoDB (need IAM)
  • Track other workers and share the work amongst shards using DynamoDB
  • KCL can run on EC2, Beanstalk or on-premise
  • Records are read in order at the shard level
  • KCL v1 supports shared consumer, v2 support both modes

Kinesis data stream operations

Shard splitting:

  • Used increase the stream capacity
  • Used to divide a hot shard
  • The old shard is closed and will be deleted once the data is expired
  • Increase capacity and cost
  • Not possible to split a shard in more then 2

Merging shards:

  • Decrease the steam capacity and save costs
  • Can be used to group two shards with low traffic
  • Old shards are closed and will be deleted once the data is expired
  • Can’t merge more than two shards in a single operation

Kinesis data firehose

  • Producers: all that can produce to kinesis data stream and kinesis data stream, AWS CloudWatch, AWS IOT
  • Record up to 1MB
  • Data could be transformed
  • Batch writes to destinations
  • AWS destinations: S3, Amazon Redshift(copy through s3), amazon open search
  • 3rd parties: data dog, mongo db new relic etc
  • HTTP endpoint
  • Once the data is sent to the destination, possible to send data to S3 for backup, or send only failed events
  • Pay for data going through firehose
  • Near real time (60 seconds latency minimum for non full batches) or minimum 1MB data at a time

Kinesis data analytics

For SQL applications

  • Sources kinesis data stream, kinesis data firehose
  • Possible to join data from S3
  • Possible to send to kinesis data stream, kinesis data firehose
  • Automatic scaling
  • Pay for actual consumption rate

For Apache Flink

  • Using Flink
  • Sources Kinesis data streams and Amazon MSK