Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DE-101: Re-platform application to Python #11

Merged
merged 20 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 0 additions & 14 deletions .c8rc

This file was deleted.

33 changes: 33 additions & 0 deletions .github/workflows/run-unit-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Run Python unit tests

on: pull_request: types: [ labeled, unlabeled, opened, reopened, synchronize ]

jobs:
changelog:
name: Updates changelog
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: dangoslen/changelog-enforcer@v3
test:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v3

- name: Set up Python 3.12
uses: actions/setup-python@v4
with:
python-version: '3.12'
cache: 'pip'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r devel_requirements.txt
pip install -r requirements.txt

- name: Run linter and test suite
run: |
make lint
make test
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,15 @@ node_modules

coverage

.aws-sam/
.coverage
.DS_Store
.vscode/
__pycache__/
.terraform.lock.hcl
.terraform
.pytest_cache
dist.zip
*env/
*.py[cod]
*$py.class
1 change: 0 additions & 1 deletion .nvmrc

This file was deleted.

1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12.0
34 changes: 0 additions & 34 deletions .travis.yml

This file was deleted.

21 changes: 21 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
.DEFAULT: help

help:
@echo "make help"
@echo " display this help statement"
@echo "make run"
@echo " run the application in devel"
@echo "make test"
@echo " run associated test suite with pytest"
@echo "make lint"
@echo " lint project files using the black linter"

run:
export ENVIRONMENT=devel; \
python -c 'import lambda_function; lambda_function.lambda_handler(None, None)'

test:
pytest tests -W ignore::DeprecationWarning

lint:
black ./ --check --exclude="(env/)|(tests/)"
66 changes: 23 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,48 @@
# Kinesis Firehose Avro to Json Transformer Lambda
[![Build Status](https://travis-ci.org/NYPL/firehose-avro-to-json-transformer.svg?branch=main)](https://travis-ci.org/NYPL/firehose-avro-to-json-transformer)

This app reads from Firehose Kinesis streams, decodes the records using the appropriate Avro schema based on the stream name, and returns the resulting records as either JSON or CSV (base64 encoded). This app is responsible for decoding records immediately before ingest into the [BIC](https://github.com/NYPL/BIC).

## Version
> v1.0.1

## Installation

Install all Node dependencies via NPM

```console
nvm use
npm install
```
This Python application is responsible for Avro-decoding events immediately before ingestion into the [BIC](https://github.com/NYPL/BIC). Originally developed for the Data Warehouse, this is deployed as an AWS Lambda (["AvroToJsonTransformer-qa"](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/AvroToJsonTransformer-qa?tab=configuration) and ["AvroToJsonTransformer-production"](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/AvroToJsonTransformer-production?tab=configuration)). In essence, the code does the following:
- Decodes the incoming batch of records using the corresponding Avro schema, which is determined based on the name of the incoming Kinesis stream
- Converts said records into a hash with `recordId`, `result: 'Ok'`, and `data` containing a JSON or CSV serialization of the record, which is also base64 encoded
- Returns processed records in this format: `{ records: [ { recordId: '[record id]', result: 'Ok', data: 'eyJmb28iOiJiYXIifQ....' }, ... ] }`

## Running Locally

Use the [sam cli](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) to run the lambda on arbitrary firehose events. To process a firehose event containing 3 CircTrans records and print out the result:
Use the [sam cli](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) to run the Lambda on arbitrary Firehose events. To process a Firehose event containing 3 CircTrans records and print out the result:

```
sam local invoke --profile nypl-digital-dev -t sam.qa.yml -e sample/firehose-CircTrans-3-records-encoded.json
sam local invoke --profile nypl-digital-dev -t config/sam.qa.yml -e sample/firehose-CircTrans-3-records-encoded.json
```

## Contributing
The [sample](./sample) folder contains sample Firehose events and their expected outcomes after Lambda event handling, so you can test the efficacy of your code with various schemas.

With Python, you also have the option of using the [python-lambda-local](https://pypi.org/project/python-lambda-local/) package for local development! You will need to create a JSON file with env variables to use said package.

## Contributing / Deployment

This repo uses the ["PRs Target Main, Merge to Deployment Branches" git workflow](https://github.com/NYPL/engineering-general/blob/main/standards/git-workflow.md#prs-target-main-merge-to-deployment-branches):
- Cut PRs from `main`
- Merge `main` > `qa`
- Merge `main` > `production`

## Deployment

This app is deployed via Travis-CI using terraform. Code in `qa` is pushed to AvroToJsonTransformer-qa. Code in `production` is pushed to AvroToJsonTransformer-production.

## Tests
This app is deployed via Travis-CI using Terraform. Code in `qa` is pushed to ["AvroToJsonTransformer-qa"](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/AvroToJsonTransformer-qa?tab=configuration). Code in `production` is pushed to ["AvroToJsonTransformer-production"](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/AvroToJsonTransformer-production?tab=configuration).

To run all tests found in `./test/`:

```console
npm run test
## Test Coverage
Use the Python [coverage package](https://coverage.readthedocs.io/en/7.6.0/) to measure test coverage:
```

To run a specific test for the given filename:

```console
npm run test [filename].test.js
coverage run -m pytest
```

### Test Coverage

This repo uses c8 to compute test coverage (because [Istanbul](https://github.com/istanbuljs/nyc) doesn't appear to support ESM at writing). Coverage reports are included at the end of `npm test`. For a detailed line-by-line breakdown, view the HTML report:

```console
npm run coverage-report
open coverage/index.html
To see what exactly which lines are missing testing:
```
coverage report -m
```

### Linting

This codebase uses [Standard JS](https://www.npmjs.com/package/standard) as the JavaScript linter.
## Linting

To check for linting errors:
This codebase uses [Black](https://github.com/psf/black) as the Python linter.

```console
npm run lint
To format the codebase as a whole:
```
make lint
```
5 changes: 5 additions & 0 deletions config/devel.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
PLAINTEXT_VARIABLES:
ENVIRONMENT: devel
NYPL_DATA_API_BASE_URL: https://qa-platform.nypl.org/api/v0.1/current-schemas/
...
4 changes: 0 additions & 4 deletions config/production.env

This file was deleted.

4 changes: 0 additions & 4 deletions config/qa.env

This file was deleted.

7 changes: 3 additions & 4 deletions sam.qa.yml → config/sam.qa.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@ Resources:
AvroToJsonTransformer:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: nodejs14.x
CodeUri: .
Handler: lambda_function.lambda_handler
Runtime: python3.12
Timeout: 10
Environment:
Variables:
NYPL_DATA_API_BASE_URL: https://qa-platform.nypl.org/api/v0.1/
SCHEMA_NAME: CircTrans
SCHEMA_PATH: current-schemas/
LOG_LEVEL: debug
1 change: 0 additions & 1 deletion context.json

This file was deleted.

10 changes: 10 additions & 0 deletions deployment_script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/zsh

rm -f -r ./package
rm -f deployment-package.zip
pip3.9 install --target ./package -r requirements.txt
cd package
zip -r ../deployment-package.zip .
cd ..
zip deployment-package.zip lambda_function.py
zip deployment-package.zip record_processor.py
5 changes: 5 additions & 0 deletions devel_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
black
nypl-py-utils[avro-client,config-helper]==1.2.0
pybase64
python-csv
python-io
1 change: 0 additions & 1 deletion event_sources.json

This file was deleted.

120 changes: 0 additions & 120 deletions index.js

This file was deleted.

Loading