-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Kinesis Firehose Response Record data class #2440
Comments
Thanks for opening your first issue here! We'll come back to you as soon as we can. |
Been playing with a custom implementation for my project. Thought I would share it here: (This uses Python 3.9 so the type alias syntax is a little different than current.)
from typing import Union, Optional, Callable
from dataclasses import dataclass
from base64 import standard_b64encode
from aws_lambda_powertools.utilities.data_classes import KinesisFirehoseEvent
KinesisFirehoseResponseRecord = Union[
"KinesisFirehoseResponseRecordOk",
"KinesisFirehoseResponseRecordDropped",
"KinesisFirehoseResponseRecordFailed",
]
@dataclass
class KinesisFirehoseEventProcessor:
event: KinesisFirehoseEvent
def process(self, fn: Callable[..., "KinesisFirehoseProcessedRecord"]):
response_records: list[KinesisFirehoseResponseRecord] = list()
for record in self.event.records:
try:
processed_record = fn(
record=record,
invocation_id=self.event.invocation_id,
delivery_stream_arn=self.event.delivery_stream_arn,
source_kinesis_stream_arn=self.event.source_kinesis_stream_arn,
region=self.event.region,
)
response_record = KinesisFirehoseResponseRecordOk(
record_id=record.record_id,
data=processed_record.data,
metadata=processed_record.metadata,
)
except KinesisFirehoseRecordProcessingDropped:
response_record = KinesisFirehoseResponseRecordDropped(
record_id=record.record_id
)
except KinesisFirehoseRecordProcessingFailed:
response_record = KinesisFirehoseResponseRecordFailed(
record_id=record.record_id
)
response_records.append(response_record)
return KinesisFirehoseResponse(records=response_records)
@dataclass
class KinesisFirehoseProcessedRecord:
data: str
metadata: Optional["KinesisFirehoseResponseRecordMetadata"] = None
@dataclass
class KinesisFirehoseResponse:
records: list["KinesisFirehoseResponseRecord"]
def to_dict(self):
return {"records": [record.to_dict() for record in self.records]}
@dataclass
class KinesisFirehoseResponseRecordMetadata:
partition_keys: Optional[dict[str, str]]
def to_dict(self):
r = dict()
if self.partition_keys is not None:
r["partitionKeys"] = self.partition_keys
return r
@dataclass
class KinesisFirehoseResponseRecordOk:
record_id: str
data: str
metadata: Optional[KinesisFirehoseResponseRecordMetadata] = None
@property
def data_b64encoded(self) -> bytes:
return standard_b64encode(self.data.encode())
def to_dict(self):
r = {
"recordId": self.record_id,
"result": "Ok",
"data": self.data_b64encoded,
"metadata": dict(),
}
if self.metadata is not None:
r["metadata"] = self.metadata.to_dict()
return r
@dataclass
class KinesisFirehoseResponseRecordFailed:
record_id: str
def to_dict(self):
return {"recordId": self.record_id, "result": "ProcessingFailed"}
@dataclass
class KinesisFirehoseResponseRecordDropped:
record_id: str
def to_dict(self):
return {"recordId": self.record_id, "result": "Dropped"}
class KinesisFirehoseRecordProcessingFailed(Exception):
...
class KinesisFirehoseRecordProcessingDropped(Exception):
... Example implementation:
import pytest
from aws_lambda_powertools.utilities.data_classes import KinesisFirehoseEvent
@pytest.fixture
def kinesis_firehose_event() -> KinesisFirehoseEvent:
"""
record1: {"text":"hello world"}
record2: {"text":"foo bar"}
"""
return KinesisFirehoseEvent(
{
"invocationId": "invoked123",
"deliveryStreamArn": "aws:lambda:events",
"region": "us-west-2",
"records": [
{
"data": "eyJ0ZXh0IjoiaGVsbG8gd29ybGQifQ==",
"recordId": "record1",
"approximateArrivalTimestamp": 1686589530000,
"kinesisRecordMetadata": {
"shardId": "shardId-000000000000",
"partitionKey": "4d1ad2b9-2 4f8-4b9d-a088-76e9947c317a",
"approximateArrivalTimestamp": "2023-06-12T17:05:30.000Z",
"sequenceNumber": "49546986683135544286507457936321625675700192471156785154", # noqa: E501
"subsequenceNumber": "",
},
},
{
"data": "eyJ0ZXh0IjoiZm9vIGJhciJ9",
"recordId": "record2",
"approximateArrivalTimestamp": 1686589530000,
"kinesisRecordMetadata": {
"shardId": "shardId-000000000001",
"partitionKey": "4d1ad2b9-24f8-4b9d-a088-76e9947c318a",
"approximateArrivalTimestamp": "2023-06-12T17:05:30.000Z",
"sequenceNumber": "49546986683135544286507457936321625675700192471156785155", # noqa: E501
"subsequenceNumber": "",
},
},
],
}
)
from json import dumps
from aws_lambda_powertools.utilities.data_classes.kinesis_firehose_event import (
KinesisFirehoseEvent,
KinesisFirehoseRecord,
)
from myproj.dataclasses.kinesis_firehose import (
KinesisFirehoseEventProcessor,
KinesisFirehoseProcessedRecord,
KinesisFirehoseResponseRecordOk,
KinesisFirehoseResponseRecordFailed,
KinesisFirehoseProcessingFailed,
)
def test_kinesis_firehose_processor(kinesis_firehose_event: KinesisFirehoseEvent):
def fn(record: KinesisFirehoseRecord, **kwargs) -> KinesisFirehoseProcessedRecord:
data = record.data_as_json.copy()
data["len"] = len(data["text"])
data_as_json = dumps(data, separators=(",", ":"))
return KinesisFirehoseProcessedRecord(data=data_as_json)
processor = KinesisFirehoseEventProcessor(kinesis_firehose_event)
response = processor.process(fn)
assert isinstance(response.records[0], KinesisFirehoseResponseRecordOk)
assert response.records[0].record_id == "record1"
assert response.records[0].data == '{"text":"hello world","len":11}'
assert isinstance(response.records[1], KinesisFirehoseResponseRecordOk)
assert response.records[1].record_id == "record2"
assert response.records[1].data == '{"text":"foo bar","len":7}' |
Hi @troyswanson thank you for opening this! Since the response object can be quite complex, I agree that we could benefit with adding those classes to our dataclasses. For reference, here's the Go types (https://github.com/aws/aws-lambda-go/blob/main/events/firehose.go#L28-L49) I can see that you already have some code too. I would love if you could submit a PR for this! What do you think? |
@rubenfonseca I can check with my company to see if I can get some time to contrib this back as an official PR. In the meantime, if someone else is able to essentially copy/paste the code that I added to this issue, I would be fine with that too! |
Hi @troyswanson! We'll be adding this new class to our EventSource utility, and when we release it, we'll give you credit in our release notes for your great job in sending us nearly finished code. Thank you so much |
@leandrodamascena Cool, thank you! The code that I included has a batch processor built in, but it would probably be more appropriate to add that functionality into the https://docs.powertools.aws.dev/lambda/python/latest/utilities/batch/ tooling that already exists. I admittedly don't have experience with that part of the Powertools library. Anyways, I'm excited to see your implementation! |
Hi @troyswanson If you think it would be helpful, we would like to see you submit a RFC specifically for supporting Kinesis Firehose in batch processing. This would allow us to focus on the main issue in this feature request, which is supporting the Kinesis Firehose Response Record data class. Leandro assigned this task to me and I'm working to add the support, Current target is to submit a pull request by the end of next week. I'll ping you when there an update and you can review before we merge it. We appreciate your feedback and we will keep you updated on our progress. |
@roger-zhangg Thanks for the update! |
Hey @troyswanson I'm excited to share the PR I've opened #3029. We would love to see you review it and share your comments. Thanks! |
|
This is now released under 2.25.0 version! |
Use case
Constructing response objects for use in Kinesis Firehose transformation functions.
This is a continuation of #1059 which describes the event object as well as the response object. The implementation for that issue can be found at #1540, but that does not include the response object.
Solution/User Experience
A data class that can be populated during the execution of a function that will be properly formed as a response to a
KinesisFirehoseEvent
invocation.Rough idea
Note: ☝🏼 I'm not sure if this is not an exhaustive list of options that can be returned
Alternative solutions
Previously, I've used basic dictionaries for this, but it would be nice to have a more structured data class to use.
The Go example in the Dynamic Partitioning in Kinesis Data Firehose has the concept of a
KinesisFirehoseResponse
in their events package.I believe it would be possible to re-use the
KinesisFirehoseEvent
data class from theutilities.data_classes
module, but this seems like it is more geared for the event invocation object as opposed to the response object.Acknowledgment
The text was updated successfully, but these errors were encountered: