Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: The KinesisFirehoseRecordMetadata model is typed incorrectly and forces a pydantic ValidationError. #3237

Closed
ahitchin opened this issue Oct 22, 2023 · 4 comments · Fixed by #3275
Assignees
Labels
bug Something isn't working event_sources Event Source Data Class utility parser Parser (Pydantic) utility

Comments

@ahitchin
Copy link

ahitchin commented Oct 22, 2023

Expected Behaviour

1. Overview

If you are using a data processor Lambda Function for a Kinesis Firehose Delivery Stream, you should be able to use the KinesisFirehoseModel model to validate events without encountering a pydantic ValidationError.

Current Behaviour

1. Overview

When you validate a Kinesis Firehose Delivery Stream event with the KinesisFirehoseModel model, it fails because the kinesis record metadata's subsequenceNumber (records[].kinesisRecordMetadata.subsequenceNumber) attribute has an incorrect type. This forces pydantic to throw a ValidationError that should not happen.

Specifically, the KinesisFirehoseModel model starts validating its sub-structures. Once it validates its kinesis records' metadata with the KinesisFirehoseRecordMetadata model, it fails. This is because sub-sequence numbers are integers, but the model's subsequenceNumber attribute is expecting a string.

2. ValidationError Output

pydantic_core._pydantic_core.ValidationError: 1 validation error for KinesisFirehoseModel
records.0.kinesisRecordMetadata.subsequenceNumber
  Input should be a valid string [type=string_type, input_value=0, input_type=int]
    For further information visit https://errors.pydantic.dev/2.4/v/string_type

Code snippet

#!/usr/bin/env python

from aws_lambda_powertools.utilities.parser.models import KinesisFirehoseModel


def main() -> None:
    """Code snippet that shows the previously mentioned validation error."""
    # Create a kinesis firehose delivery stream event
    firehose_event: dict = {
        "invocationId": "00000000-0000-0000-0000-000000000000",
        "sourceKinesisStreamArn": "arn:aws-us-gov:kinesis:us-gov-west-1:000000000000:stream/A",
        "deliveryStreamArn": "arn:aws-us-gov:firehose:us-gov-west-1:000000000000:deliverystream/A",
        "region": "us-gov-west-1",
        "records": [
            {
                "recordId": "00000000000000000000000000000000000000000000000000000000000000",
                "approximateArrivalTimestamp": 1697943843714,
                "data": "YnVnIHJlcG9ydA==",
                "kinesisRecordMetadata": {
                    "sequenceNumber": "00000000000000000000000000000000000000000000000000000000",
                    "subsequenceNumber": 0,
                    "partitionKey": "00000000000000000000000000000000",
                    "shardId": "shardId-000000000000",
                    "approximateArrivalTimestamp": 1697943843714
                }
            }
        ]
    }

    # Create a model from the event (this fails)
    model: KinesisFirehoseModel = KinesisFirehoseModel.model_validate(firehose_event)
    print(model)

    return None

if __name__ == "__main__":
    main()

Possible Solution

1. Overview

Update the Kinesis Firehose pydantic model and data class with the correct type hints.

2. Problematic Code

3. Proposal for the Pydantic Model

class KinesisFirehoseRecordMetadata(BaseModel):
    shardId: str
    partitionKey: str
    approximateArrivalTimestamp: PositiveInt
    sequenceNumber: str
    subsequenceNumber: int

4. Proposal for the Data Class Dictionary Wrapper

class KinesisFirehoseRecordMetadata(DictWrapper):
    @property
    def subsequence_number(self) -> int:
        """Kinesis stream sub-sequence number; present only when Kinesis Stream is source

        Note: this will only be present for Kinesis streams using record aggregation
        """
        return self._metadata["subsequenceNumber"]

5. Examples From Other Amazon/AWS Packages

Steps to Reproduce

1. Overview

  1. Have an event from a Kinesis Firehose Delivery Stream.
  2. Validate the event with the KinesisFirehoseModel pydantic model.
  3. Encounter an error when the KinesisFirehoseRecordMetadata fails to validate.

2. Sample Event

{
  "invocationId": "00000000-0000-0000-0000-000000000000",
  "sourceKinesisStreamArn": "arn:aws-us-gov:kinesis:us-gov-west-1:000000000000:stream/A",
  "deliveryStreamArn": "arn:aws-us-gov:firehose:us-gov-west-1:000000000000:deliverystream/A",
  "region": "us-gov-west-1",
  "records": [
    {
      "recordId": "00000000000000000000000000000000000000000000000000000000000000",
      "approximateArrivalTimestamp": 1697943843714,
      "data": "YnVnIHJlcG9ydA==",
      "kinesisRecordMetadata": {
        "sequenceNumber": "00000000000000000000000000000000000000000000000000000000",
        "subsequenceNumber": 0,
        "partitionKey": "00000000000000000000000000000000",
        "shardId": "shardId-000000000000",
        "approximateArrivalTimestamp": 1697943843714
      }
    }
  ]
}

Powertools for AWS Lambda (Python) version

2.26.0

AWS Lambda function runtime

3.11

Packaging format used

PyPi

Debugging logs

N/A
@ahitchin ahitchin added bug Something isn't working triage Pending triage from maintainers labels Oct 22, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 22, 2023

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

@leandrodamascena
Copy link
Contributor

Hi @ahitchin! Thanks for opening this issue and I can confirm it's a bug. Checking the links you sent and here this field should actually be integer instead of str.

Would you like to submit a PR to fix this? We'd love to have your first contribution here. If so, change the class, the test and also the [event] (https://github.com/aws-powertools/powertools-lambda-python/blob/develop/tests/events/kinesisFirehoseKinesisEvent.json). You can do the same for the data class test.

@leandrodamascena leandrodamascena added event_sources Event Source Data Class utility parser Parser (Pydantic) utility and removed triage Pending triage from maintainers labels Oct 22, 2023
@leandrodamascena leandrodamascena moved this from Triage to Pending customer in Powertools for AWS Lambda (Python) Oct 22, 2023
@leandrodamascena leandrodamascena self-assigned this Oct 22, 2023
@leandrodamascena leandrodamascena linked a pull request Oct 31, 2023 that will close this issue
7 tasks
@github-project-automation github-project-automation bot moved this from Pending customer to Coming soon in Powertools for AWS Lambda (Python) Oct 31, 2023
@github-actions
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@github-actions github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Oct 31, 2023
Copy link
Contributor

This is now released under 2.26.1 version!

@github-actions github-actions bot removed the pending-release Fix or implementation already in dev waiting to be released label Nov 10, 2023
@heitorlessa heitorlessa moved this from Coming soon to Shipped in Powertools for AWS Lambda (Python) Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working event_sources Event Source Data Class utility parser Parser (Pydantic) utility
Projects
Status: Shipped
2 participants