In this module you'll use Amazon Simple Storage Service (S3), AWS Lamdba, and Amazon DynamoDB to process data from JSON files. Objects created in the Amazon S3 bucket will trigger an AWS Lambda function to process the new file. The Lambda function will read the data and populate records into an Amazon DynamoDB table.
Our producer is a sensor attached to a unicorn - Shadowfax - currently taking a passenger on a Wild Ryde. This sensor aggregates sensor data every minute including the distance the unicorn traveled and maximum and minimum magic points and hit points readings in the previous minute. These readings are stored in data files which are uploaded on a daily basis to Amazon S3.
The Amazon S3 bucket has an event notification configured to trigger the AWS Lambda function that will retrieve the file, process it, and populate the Amazon DynamoDB table.
Use the console or CLI to create an S3 bucket. Keep in mind, your bucket's name must be globally unique. We recommend using a name like wildrydes-uploads-yourname
.
Step-by-step instructions (expand for details)
-
From the AWS Console click Services then select S3 under Storage.
-
Click +Create Bucket
-
Provide a globally unique name for your bucket such as
wildrydes-uploads-yourname
. -
Select a region for your bucket.
-
Use the default values and click Next through the rest of the sections and click Create Bucket on the review section.
Use the Amazon DynamoDB console to create a new DynamoDB table. Call your table UnicornSensorData
and give it a Partition key called Name
of type String and a Sort key called StatusTime
of type Number. Use the defaults for all other settings.
After you've created the table, note the Amazon Resource Name (ARN) for use in the next section.
Step-by-step instructions (expand for details)
-
From the AWS Management Console, choose Services then select DynamoDB under Databases.
-
Choose Create table.
-
Enter
UnicornSensorData
for the Table name. -
Enter
Name
for the Partition key and select String for the key type. -
Tick the Add sort key checkbox. Enter
StatusTime
for the Sort key and select Number for the key type. -
Check the Use default settings box and choose Create.
-
Scroll to the bottom of the Overview section of your new table and note the ARN. You will use this in the next section.
Use the IAM console to create a new role. Give it a name like WildRydesFileProcessorRole
and select AWS Lambda for the role type. Attach the managed policy called AWSLambdaBasicExecutionRole
to this role in order to grant permissions for your function to log to Amazon CloudWatch Logs.
You'll need to grant this role permissions to access both the S3 bucket and Amazon DynamoDB table created in the previous sections:
-
Create an inline policy allowing the role access to the
ddb:BatchWriteItem
action for the Amazon DynamoDB table you created in the previous section. -
Create an inline policy allowing the role access to the
s3:GetObject
action for the S3 bucket you created in the first section.
Step-by-step instructions (expand for details)
-
From the AWS Console, click on Services and then select IAM in the Security, Identity & Compliance section.
-
Select Roles from the left navigation and then click Create new role.
-
Select AWS Lambda for the role type from AWS Service Role.
Note: Selecting a role type automatically creates a trust policy for your role that allows AWS services to assume this role on your behalf. If you were creating this role using the CLI, AWS CloudFormation or another mechanism, you would specify a trust policy directly.
-
Begin typing
AWSLambdaBasicExecutionRole
in the Filter text box and check the box next to that role. -
Click Next Step.
-
Enter
WildRydesFileProcessorRole
for the Role Name. -
Click Create role.
-
Type
WildRydesFileProcessorRole
into the filter box on the Roles page and click the role you just created. -
On the Permissions tab, expand the Inline Policies section and click the link to create a new inline policy.
-
Ensure Policy Generator is selected and click Select.
-
Select Amazon DynamoDB from the AWS Service dropdown.
-
Select BatchWriteItem from the Actions list.
-
Type the ARN of the DynamoDB table you created in the previous section in the Amazon Resource Name (ARN) field. The ARN is in the format of:
arn:aws:dynamodb:REGION:ACCOUNT_ID:table/UnicornSensorData
For example, if you've deployed to US East (N. Virginia) and your account ID is 123456789012, your table ARN would be:
arn:aws:dynamodb:us-east-1:123456789012:table/UnicornSensorData
To find your AWS account ID number in the AWS Management Console, click on Support in the navigation bar in the upper-right, and then click Support Center. Your currently signed in account ID appears in the upper-right corner below the Support menu.
-
Click Add Statement.
-
Select Amazon S3 from the AWS Service dropdown.
-
Select GetObject from the Actions list.
-
Type the ARN of the S3 table you created in the first section in the Amazon Resource Name (ARN) field. The ARN is in the format of:
arn:aws:s3:::YOUR_BUCKET_NAME_HERE/*
For example, if you've named your bucket
wildrydes-uploads-johndoe
, your bucket ARN would be:arn:aws:s3:::wildrydes-uploads-johndoe/*
-
Click Add Statement.
-
Click Next Step then Apply Policy.
Use the console to create a new Lambda function called WildRydesFileProcessor
that will be triggered whenever a new object is created in the bucket created in the first section.
Use the provided index.js example implementation for your function code by copying and pasting the contents of that file into the Lambda console's editor. Ensure you create an environment variable with the key TABLE_NAME
and the value UnicornSensorData
.
Make sure you configure your function to use the WildRydesFileProcessorRole
IAM role you created in the previous section.
Step-by-step instructions (expand for details)
-
Click on Services then select Lambda in the Compute section.
-
Click Create function.
-
Click on Author from scratch.
-
Enter
WildRydesFileProcessor
in the Name field. -
Select
WildRydesFileProcessorRole
from the Existing Role dropdown. -
Click on Create function.
-
Click on Triggers then click + Add trigger
-
Click on the dotted outline and select S3. Select wildrydes-uploads-yourname from Bucket, Object Created (All) from Event type, and tick the Enable trigger checkbox.
-
Click Submit.
-
Click Configuration.
-
Select Node.js 6.10 for the Runtime.
-
Leave the default of
index.handler
for the Handler field. -
Copy and paste the code from index.js into the code entry area.
-
Extend Environment variables under the entry area
-
In Environment variables, enter an environment variable with key
TABLE_NAME
and valueUnicornSensorData
. -
Scroll down to Basic settings and set Timeout to 5 minutes to accommodate large files.
-
Optionally enter a description under Timeout.
-
Scroll to top and click "Save" (Not "Save and test" since we haven't configured any test event)
-
Using either the AWS Management Console or AWS Command Line Interface, copy the provided data/shadowfax-2016-02-12.json data file to the Amazon S3 bucket created in the first section.
You can either download this file via your web browser and upload it using the AWS Management Console, or you use the AWS CLI to copy it directly:
aws s3 cp s3://wildrydes-data-processing/data/shadowfax-2016-02-12.json s3://YOUR_BUCKET_NAME_HERE
-
Click on Services then select DynamoDB in the Database section.
-
Click on UnicornSensorData.
-
Click on the Items tab and verify that the table has been populated with the items from the data file.
When you see items from the JSON file in the table, you can move onto the next module: Real-time Data Streaming.
- Enhance the implementation to gracefully handle lines with malformed JSON. Edit the file to include a malformed line and verify the function is able to process the file. Consider how you would handle unprocessable lines in a production implementation.
- Inspect the Amazon CloudWatch Logs stream associated with the Lambda function and note the duration the function executes. Change the provisioned write throughput of the DynamoDB table and copy the file to the bucket once again as a new object. Check the logs once more and note the lower duration.