Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(event_source): add Kinesis Firehose Data Transformation data class #3029

Merged
merged 52 commits into from
Sep 14, 2023

Conversation

roger-zhangg
Copy link
Member

@roger-zhangg roger-zhangg commented Aug 30, 2023

Issue number:#2440

Summary

Support Kinesis Firehose Response Record data class in event_sources.

This PR is still in draft status, much to discuss including

  • Dataclasses are mostly used for read only Dicts, this request need to construct dict before using the data classes, which is different. Need to review on the current approach
  • Discuss what functionality should be supported. E.g (data.from_str, data.from_binary)

Changes

Please provide a summary of what's being changed
Added three data_classes and their constructor that supports Kinesis Firehose Response Record

User experience

Please share what the user experience looks like before and after this change

BEFORE

import base64

print('Loading function')


def lambda_handler(event, context):
    output = []

    for record in event['records']:
        print(record['recordId'])
        payload = base64.b64decode(record['data']).decode('utf-8')

        # Do custom processing on the payload here

        output_record = {
            'recordId': record['recordId'],
            'result': 'Ok',
            'data': base64.b64encode(payload.encode('utf-8')).decode('utf-8')
        }
        output.append(output_record)

    print('Successfully processed {} records.'.format(len(event['records'])))

    return {'records': output}

AFTER

import base64

from aws_lambda_powertools.utilities.data_classes import (
    KinesisFirehoseDataTransformationRecord,
    KinesisFirehoseDataTransformationResponse,
)
from aws_lambda_powertools.utilities.serialization import base64_from_json
from aws_lambda_powertools.utilities.typing import LambdaContext


def lambda_handler(event: dict, context: LambdaContext):
    result = KinesisFirehoseDataTransformationResponse()

    for record in event["records"]:
        print(record["recordId"])
        payload = base64.b64decode(record["data"]).decode("utf-8")
        ## do all kind of stuff with payload
        ## generate data to return
        transformed_data = {"tool_used": "powertools_dataclass", "original_payload": payload}

        processed_record = KinesisFirehoseDataTransformationRecord(
            record_id=record["recordId"],
            data=base64_from_json(transformed_data),
        )
        result.add_record(processed_record)

    # return transformed records
    return result.asdict()

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@boring-cyborg boring-cyborg bot added the tests label Aug 30, 2023
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 30, 2023
@github-actions github-actions bot added the feature New feature or functionality label Aug 30, 2023
@roger-zhangg roger-zhangg changed the title feat(event_sources): Kinesis Firehose Response Record data class feat(data-classes): Kinesis Firehose Response Record data class Aug 30, 2023
@roger-zhangg
Copy link
Member Author

roger-zhangg commented Aug 30, 2023

from here

class DictWrapper(Mapping):
"""Provides a single read only access to a wrapper dict"""

This DictWrapper is designed to be read-only and used by all data-classes. Currently all data-classes in powertools are used to parse events. But for this issue, we are using data-classes to create the event. And this use case doesn't exist in powertools data-classes yet.
In this case I can't set value directly to data class expect init the class with data parameter. As you can see in my current code,

def KinesisFirehoseResponceRecordFactory(
record_id: str,
result: Union[FirehoseStateOk, FirehoseStateDropped, FirehoseStateFailed],
data: str,
metadata: Optional[KinesisFirehoseResponseRecordMetadata] = None,
json_deserializer: Optional[Callable] = None,
) -> KinesisFirehoseResponceRecord:
pass_data = {
"recordId": record_id,
"result": result,
"data": base64.b64encode(data.encode("utf-8")).decode("utf-8"),
}
if metadata:
data["metadata"] = metadata
return KinesisFirehoseResponceRecord(data=pass_data, json_deserializer=json_deserializer)

I'm using a factory method to create each of these data classes. But I don't think this is a good solution. So it's better we can discuss to figure out a better way to create these data classes

From my perspective I can think of three possible solution of to support this

  • Add a setter method to DictWrapper and init the class as a data class
# possible UX
result = KinesisFirehoseResponce({})
for record in event.records:
        data = record.data_as_json
        processed_record = KinesisFirehoseResponceRecord(
            record_id=record.record_id,
            result=FirehoseStateOk,
        )
        processed_record.data_from_json(data)
        result.add_record(processed_record)
return result
  • Add a new Wrapper class to support this specific case for creating events in lambda
# possible UX
        processed_record = KinesisFirehoseResponceRecord(
            record_id=record.record_id,
            result=FirehoseStateOk,
            data=data,
        )
  • Do nothing, just keep the current data approach
# possible UX
data={
    "record_id":record.record_id,
    "result":FirehoseStateOk,
    "data":data_raw,
}
record = KinesisFirehoseResponceRecord(data=data)

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @roger-zhangg! I'm reviewing the code, your comments, and reading the AWS documentation for Kinesis Firehose Response, but in the meantime, I think you should fix these issues.

Please mark this PR as "ready to review".

@roger-zhangg roger-zhangg marked this pull request as ready for review August 30, 2023 20:46
@roger-zhangg roger-zhangg requested a review from a team as a code owner August 30, 2023 20:46
@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2023

Codecov Report

Patch coverage: 69.56% and project coverage change: -0.23% ⚠️

Comparison is base (f673368) 96.36% compared to head (6f95e8f) 96.13%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3029      +/-   ##
===========================================
- Coverage    96.36%   96.13%   -0.23%     
===========================================
  Files          185      186       +1     
  Lines         8064     8131      +67     
  Branches      1509     1525      +16     
===========================================
+ Hits          7771     7817      +46     
- Misses         236      252      +16     
- Partials        57       62       +5     
Files Changed Coverage Δ
...s/utilities/data_classes/kinesis_firehose_event.py 83.19% <66.66%> (-15.22%) ⬇️
aws_lambda_powertools/utilities/serialization.py 81.81% <81.81%> (ø)
...mbda_powertools/utilities/data_classes/__init__.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@leandrodamascena
Copy link
Contributor

  • Add a setter method to DictWrapper and init the class as a data class
# possible UX
result = KinesisFirehoseResponce({})
for record in event.records:
        data = record.data_as_json
        processed_record = KinesisFirehoseResponceRecord(
            record_id=record.record_id,
            result=FirehoseStateOk,
        )
        processed_record.data_from_json(data)
        result.add_record(processed_record)
return result

I like this UX, easy and clean to create records and add these in the response. I'm just wondering if we really need DictWrapper. As you said, the DictWrapper class is very much meant for parsing events and getting elements, what do you think about just keeping this class as @dataclasses?

@leandrodamascena
Copy link
Contributor

Please review the class name, there is a typo: Responce instead of Response.

Thanks

@roger-zhangg
Copy link
Member Author

Thanks for the suggestion, I'll refactor to remove the DictWrapper and keep this UX

@heitorlessa
Copy link
Contributor

heitorlessa commented Aug 31, 2023 via email

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some few comments.

@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Sep 1, 2023
@heitorlessa
Copy link
Contributor

coming at this later tonight... had to switch to #3091 as it seemed a potential customer issue

@heitorlessa
Copy link
Contributor

heitorlessa commented Sep 14, 2023

Three quick things only and we can merge <3

  • It’s missing “Examples” in docstring (for the ones you believe a customer will be mostly exposed to, 1 example suffice)
  • During result validation, you’re creating a list with valid strings for every iteration - you can use a class variable (class definition) and use a tuple (fixed array size for memory footprint)
  • result="Ok" is the default now, we can remove it to keep it clean. Then in the docs, we can have a tabbed example with the non-ok examples so they know when to use Dropped and ProcessingFailed.

@pull-request-size pull-request-size bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 14, 2023
@heitorlessa
Copy link
Contributor

Item 1 and 2 are done.


Docstring experience

image

API reference experience

image

@heitorlessa
Copy link
Contributor

heitorlessa commented Sep 14, 2023

As soon as the docs contain both the default (ok), Dropped, and ProcessingFailed, along with key parts highlighted (what to import, where standalone function is used, result.add_record, asdict), and any comments or code annotation... it's good to merge ;)

cc @leandrodamascena

Gonna to go to bed (10:33pm)

@leandrodamascena leandrodamascena self-requested a review September 14, 2023 21:18
@sonarcloud
Copy link

sonarcloud bot commented Sep 14, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.5% 0.5% Duplication

@heitorlessa
Copy link
Contributor

heitorlessa commented Sep 14, 2023

Docs final touches sent

  • Use code annotations to provide links, rich formatting, for comments where you can provide more contextual information. Whereas comments would be limited to a single line (cognitive load)
  • Always link to the official docs for authorative source
  • Provide information upfront to ease customers learning curve in complex topics - in this case, it'd take a while for customers to understand the nuances between the 3 modes (OK, DROP, FAILED)
  • Use realistic examples - always put yourself in the customers shoes, specially when they might not be developers. In the exception example, we want to make it true that you could drop an unwanted record :)
  • The failed example now uses a code annotation to explain how Kinesis Firehose handles failure. Adding comments would be a distraction, and even if we were to explain it'd be hard to keep it up to date if it changes. Instead, we go straight to the point and add a link for more details.
  • Don't be afraid of using newlines and removing multi-line comments. Whenever there's more than one line of comment, maybe there's a better way of phrasing it, or use code annotation.

@heitorlessa
Copy link
Contributor

Now I can go to sleep hahaha. Thank you SO MUCH both of you for the great work here ;)

@leandrodamascena leandrodamascena merged commit 455836c into aws-powertools:develop Sep 14, 2023
14 checks passed
@troyswanson
Copy link
Contributor

Thank you all! This is a very exciting feature for Powertools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commons documentation Improvements or additions to documentation feature New feature or functionality size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Kinesis Firehose Response Record data class
5 participants