Skip to content

Latest commit

 

History

History
228 lines (126 loc) · 10.8 KB

File metadata and controls

228 lines (126 loc) · 10.8 KB

Module 1: File Processing

In this module you'll use Amazon Simple Storage Service (S3), AWS Lamdba, and Amazon DynamoDB to process data from JSON files. Objects created in the Amazon S3 bucket will trigger an AWS Lambda function to process the new file. The Lambda function will read the data and populate records into an Amazon DynamoDB table.

Architecture Overview

Architecture

Our producer is a sensor attached to a unicorn - Shadowfax - currently taking a passenger on a Wild Ryde. This sensor aggregates sensor data every minute including the distance the unicorn traveled and maximum and minimum magic points and hit points readings in the previous minute. These readings are stored in data files which are uploaded on a daily basis to Amazon S3.

The Amazon S3 bucket has an event notification configured to trigger the AWS Lambda function that will retrieve the file, process it, and populate the Amazon DynamoDB table.

Implementation Instructions

1. Create an Amazon S3 bucket

Use the console or CLI to create an S3 bucket. Keep in mind, your bucket's name must be globally unique. We recommend using a name like wildrydes-uploads-yourname.

Step-by-step instructions (expand for details)

  1. From the AWS Console click Services then select S3 under Storage.

  2. Click +Create Bucket

  3. Provide a globally unique name for your bucket such as wildrydes-uploads-yourname.

  4. Select a region for your bucket.

    Create bucket screenshot

  5. Use the default values and click Next through the rest of the sections and click Create Bucket on the review section.

2. Create an Amazon DynamoDB Table

Use the Amazon DynamoDB console to create a new DynamoDB table. Call your table UnicornSensorData and give it a Partition key called Name of type String and a Sort key called StatusTime of type Number. Use the defaults for all other settings.

After you've created the table, note the Amazon Resource Name (ARN) for use in the next section.

Step-by-step instructions (expand for details)

  1. From the AWS Management Console, choose Services then select DynamoDB under Databases.

  2. Choose Create table.

  3. Enter UnicornSensorData for the Table name.

  4. Enter Name for the Partition key and select String for the key type.

  5. Tick the Add sort key checkbox. Enter StatusTime for the Sort key and select Number for the key type.

  6. Check the Use default settings box and choose Create.

    Create table screenshot

  7. Scroll to the bottom of the Overview section of your new table and note the ARN. You will use this in the next section.

3. Create an IAM role for your Lambda function

Use the IAM console to create a new role. Give it a name like WildRydesFileProcessorRole and select AWS Lambda for the role type. Attach the managed policy called AWSLambdaBasicExecutionRole to this role in order to grant permissions for your function to log to Amazon CloudWatch Logs.

You'll need to grant this role permissions to access both the S3 bucket and Amazon DynamoDB table created in the previous sections:

  • Create an inline policy allowing the role access to the ddb:BatchWriteItem action for the Amazon DynamoDB table you created in the previous section.

  • Create an inline policy allowing the role access to the s3:GetObject action for the S3 bucket you created in the first section.

Step-by-step instructions (expand for details)

  1. From the AWS Console, click on Services and then select IAM in the Security, Identity & Compliance section.

  2. Select Roles from the left navigation and then click Create new role.

  3. Select AWS Lambda for the role type from AWS Service Role.

    Note: Selecting a role type automatically creates a trust policy for your role that allows AWS services to assume this role on your behalf. If you were creating this role using the CLI, AWS CloudFormation or another mechanism, you would specify a trust policy directly.

  4. Begin typing AWSLambdaBasicExecutionRole in the Filter text box and check the box next to that role.

  5. Click Next Step.

  6. Enter WildRydesFileProcessorRole for the Role Name.

  7. Click Create role.

  8. Type WildRydesFileProcessorRole into the filter box on the Roles page and click the role you just created.

  9. On the Permissions tab, expand the Inline Policies section and click the link to create a new inline policy.

    Inline policies screenshot

  10. Ensure Policy Generator is selected and click Select.

  11. Select Amazon DynamoDB from the AWS Service dropdown.

  12. Select BatchWriteItem from the Actions list.

  13. Type the ARN of the DynamoDB table you created in the previous section in the Amazon Resource Name (ARN) field. The ARN is in the format of:

    arn:aws:dynamodb:REGION:ACCOUNT_ID:table/UnicornSensorData
    

    For example, if you've deployed to US East (N. Virginia) and your account ID is 123456789012, your table ARN would be:

    arn:aws:dynamodb:us-east-1:123456789012:table/UnicornSensorData
    

    To find your AWS account ID number in the AWS Management Console, click on Support in the navigation bar in the upper-right, and then click Support Center. Your currently signed in account ID appears in the upper-right corner below the Support menu.

    Policy generator screenshot

  14. Click Add Statement.

    Policy screenshot

  15. Select Amazon S3 from the AWS Service dropdown.

  16. Select GetObject from the Actions list.

  17. Type the ARN of the S3 table you created in the first section in the Amazon Resource Name (ARN) field. The ARN is in the format of:

    arn:aws:s3:::YOUR_BUCKET_NAME_HERE/*
    

    For example, if you've named your bucket wildrydes-uploads-johndoe, your bucket ARN would be:

    arn:aws:s3:::wildrydes-uploads-johndoe/*
    

    Policy generator screenshot

  18. Click Add Statement.

    Policy screenshot

  19. Click Next Step then Apply Policy.

4. Create a Lambda function for processing

Use the console to create a new Lambda function called WildRydesFileProcessor that will be triggered whenever a new object is created in the bucket created in the first section.

Use the provided index.js example implementation for your function code by copying and pasting the contents of that file into the Lambda console's editor. Ensure you create an environment variable with the key TABLE_NAME and the value UnicornSensorData.

Make sure you configure your function to use the WildRydesFileProcessorRole IAM role you created in the previous section.

Step-by-step instructions (expand for details)

  1. Click on Services then select Lambda in the Compute section.

  2. Click Create function.

  3. Click on Author from scratch.

  4. Enter WildRydesFileProcessor in the Name field.

  5. Select WildRydesFileProcessorRole from the Existing Role dropdown. Create Lambda function screenshot

  6. Click on Create function.

  7. Click on Triggers then click + Add trigger

  8. Click on the dotted outline and select S3. Select wildrydes-uploads-yourname from Bucket, Object Created (All) from Event type, and tick the Enable trigger checkbox. Create Lambda function screenshot

  9. Click Submit.

  10. Click Configuration.

  11. Select Node.js 6.10 for the Runtime.

  12. Leave the default of index.handler for the Handler field.

  13. Copy and paste the code from index.js into the code entry area.

  14. Extend Environment variables under the entry area

  15. In Environment variables, enter an environment variable with key TABLE_NAME and value UnicornSensorData. Lambda environment variable screenshot

  16. Scroll down to Basic settings and set Timeout to 5 minutes to accommodate large files. Create Lambda function screenshot

  17. Optionally enter a description under Timeout.

  18. Scroll to top and click "Save" (Not "Save and test" since we haven't configured any test event)

Implementation Validation

  1. Using either the AWS Management Console or AWS Command Line Interface, copy the provided data/shadowfax-2016-02-12.json data file to the Amazon S3 bucket created in the first section.

    You can either download this file via your web browser and upload it using the AWS Management Console, or you use the AWS CLI to copy it directly:

    aws s3 cp s3://wildrydes-data-processing/data/shadowfax-2016-02-12.json s3://YOUR_BUCKET_NAME_HERE
  2. Click on Services then select DynamoDB in the Database section.

  3. Click on UnicornSensorData.

  4. Click on the Items tab and verify that the table has been populated with the items from the data file.

    DynamoDB items screenshot

When you see items from the JSON file in the table, you can move onto the next module: Real-time Data Streaming.

Extra Credit

  • Enhance the implementation to gracefully handle lines with malformed JSON. Edit the file to include a malformed line and verify the function is able to process the file. Consider how you would handle unprocessable lines in a production implementation.
  • Inspect the Amazon CloudWatch Logs stream associated with the Lambda function and note the duration the function executes. Change the provisioned write throughput of the DynamoDB table and copy the file to the bucket once again as a new object. Check the logs once more and note the lower duration.