Skip to content

Transform records from a Kinesis Firehose stream from Avro encoded JSON to JSON.

Notifications You must be signed in to change notification settings

NYPL/firehose-avro-to-json-transformer

Repository files navigation

Kinesis Firehose Avro to Json Transformer Lambda

Build Status

This Python application is responsible for Avro-decoding events immediately before ingestion into the BIC. Originally developed for the Data Warehouse, this is deployed as an AWS Lambda ("AvroToJsonTransformer-qa" and "AvroToJsonTransformer-production"). In essence, the code does the following:

  • Decodes the incoming batch of records using the corresponding Avro schema, which is determined based on the name of the incoming Kinesis stream
  • Converts said records into a hash with recordId, result: 'Ok', and data containing a JSON or CSV serialization of the record, which is also base64 encoded
  • Returns processed records in this format: { records: [ { recordId: '[record id]', result: 'Ok', data: 'eyJmb28iOiJiYXIifQ....' }, ... ] }

Running Locally

Use the sam cli to run the Lambda on arbitrary Firehose events. To process a Firehose event containing 3 CircTrans records and print out the result:

sam local invoke --profile nypl-digital-dev -t config/sam.qa.yml -e sample/firehose-CircTrans-3-records-encoded.json

The sample folder contains sample Firehose events and their expected outcomes after Lambda event handling, so you can test the efficacy of your code with various schemas.

With Python, you also have the option of using the python-lambda-local package for local development! You will need to create a JSON file with env variables to use said package.

Git workflow

This repo uses the Main-QA-Production git workflow.

main has the latest and greatest commits, qa has what's in our QA environment, and production has what's in our production environment.

Ideal Workflow

  • Cut a feature branch off of main
  • Commit changes to your feature branch
  • File a pull request against main and assign a reviewer (who must be an owner)
    • Include relevant updates to pyproject.toml and README
    • In order for the PR to be accepted, it must pass all unit tests, have no lint issues, and update the CHANGELOG (or contain the Skip-Changelog label in GitHub)
  • After the PR is accepted, merge into main
  • Merge main > qa
  • Deploy app to QA on GitHub and confirm it works
  • Merge qa > production
  • Deploy app to production on GitHub and confirm it works

Test Coverage

Use the Python coverage package to measure test coverage:

coverage run -m pytest

To see what exactly which lines are missing testing:

coverage report -m

Linting

This codebase uses Black as the Python linter.

To format the codebase as a whole:

make lint

About

Transform records from a Kinesis Firehose stream from Avro encoded JSON to JSON.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •