This Python application is responsible for Avro-decoding events immediately before ingestion into the BIC. Originally developed for the Data Warehouse, this is deployed as an AWS Lambda ("AvroToJsonTransformer-qa" and "AvroToJsonTransformer-production"). In essence, the code does the following:
- Decodes the incoming batch of records using the corresponding Avro schema, which is determined based on the name of the incoming Kinesis stream
- Converts said records into a hash with
recordId
,result: 'Ok'
, anddata
containing a JSON or CSV serialization of the record, which is also base64 encoded - Returns processed records in this format:
{ records: [ { recordId: '[record id]', result: 'Ok', data: 'eyJmb28iOiJiYXIifQ....' }, ... ] }
Use the sam cli to run the Lambda on arbitrary Firehose events. To process a Firehose event containing 3 CircTrans records and print out the result:
sam local invoke --profile nypl-digital-dev -t config/sam.qa.yml -e sample/firehose-CircTrans-3-records-encoded.json
The sample folder contains sample Firehose events and their expected outcomes after Lambda event handling, so you can test the efficacy of your code with various schemas.
With Python, you also have the option of using the python-lambda-local package for local development! You will need to create a JSON file with env variables to use said package.
This repo uses the Main-QA-Production git workflow.
main
has the latest and greatest commits, qa
has what's in our QA environment, and production
has what's in our production environment.
- Cut a feature branch off of
main
- Commit changes to your feature branch
- File a pull request against
main
and assign a reviewer (who must be an owner)- Include relevant updates to pyproject.toml and README
- In order for the PR to be accepted, it must pass all unit tests, have no lint issues, and update the CHANGELOG (or contain the
Skip-Changelog
label in GitHub)
- After the PR is accepted, merge into
main
- Merge
main
>qa
- Deploy app to QA on GitHub and confirm it works
- Merge
qa
>production
- Deploy app to production on GitHub and confirm it works
Use the Python coverage package to measure test coverage:
coverage run -m pytest
To see what exactly which lines are missing testing:
coverage report -m
This codebase uses Black as the Python linter.
To format the codebase as a whole:
make lint