🚨 August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink.
Snapshot Manager - Amazon Kinesis Data Analytics for Apache Flink offers the following benefits:
- takes a new snapshot of a running Kinesis Data Analytics for Apache Flink Application
- gets a count of application snapshots
- checks if the count is more than the required number of snapshots
- deletes older snapshots that are older than the required number
This will be deployed as an AWS Lambda function and scheduled using Amazon EventBridge rules e.g. once in a day or week.
Contents:
- Architecture
- Process flow diagram
- Prerequisites
- AWS service requirements
- Deployment instructions using AWS console
Figure below represents the architecture of Snapshot Manager.
Figure below represents the process flow of Snapshot Manager.
- Python 3.7
- IDE e.g. PyCharm
- Access to AWS Account
- A running Kinesis Data Analytics for Apache Flink Application
The following AWS services are required to deploy this starter kit:
- 1 AWS Lambda Function
- 1 Amazon SNS Topic
- 1 Amazon DynamoDB Table
- 1 IAM role with 4 policies
- 1 AWS CloudWatch Event Rule
-
Create an SNS Topic and subscribe required e-mail id(s)
-
Create a DynamoDB Table
- Table name=
snapshot_manager_status
- Primary partition key: name=
app_name
, type= String - Primary sort key: name=
snapshot_manager_run_id
, type= Number - Provisioned read capacity units = 5
- Provisioned write capacity units = 5
- Table name=
-
Create following IAM policies
- IAM policy with name
iam_policy_dynamodb
using this sample - IAM policy with name
iam_policy_sns
using this sample - IAM policy with name
iam_policy_kinesisanalytics
using this sample - IAM policy with name
iam_policy_cloudwatch_logs
using this sample
- IAM policy with name
-
Create an IAM role for Lambda with name
snapshot_manager_iam_role
and attach above policies -
Deploy snapshot_manager function
-
Function name =
snapshot_manager
-
Runtime = Python 3.7
-
IAM role = Select
snapshot_manager_iam_role
created above -
Function code = Copy the contents from amazon_kinesis_data_analytics_for_apache_flink_snapshot_manager.py
-
Under General configuration:
- Timeout = e.g. 5 minutes
- Memory = e.g. 128 MB
-
Environment variable = as defined in the following table
Key Value Description aws_region us-east-1 AWS region app_name Application Name
Application name of Kinesis Data Analytics for Apache Flink snapshot_manager_ddb_table_name snapshot_manager_status
Name of the DynamoDB table used to track the status primary_partition_key_name app_name
Primary partition key name primary_sort_key_name snapshot_manager_run_id
Primary sort key name sns_topic_arn SNS Topic ARN
SNS Topic ARN number_of_older_snapshots_to_retain 30
The number of most recent snapshots to be retained snapshot_creation_wait_time_seconds 15
Time gap in seconds between consecutive checks to get the status of snapshot creation
-
-
Go to Amazon Create EventBridge and create a rule
- Name =
SnapshotManagerEventRule
- Description = EventBridge Rule to invoke Snapshot Manager Lambda
- Define pattern =
Schedule
with desired fixed rate e.g. 6 Hours - Select targets
- Target = Lambda function
- Function = Previously created lambda Function
snapshot_manager
- Name =
This sample code is made available under the MIT-0 license. See the LICENSE file.