Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 'strict_decode' option for protocols #227

Merged
merged 4 commits into from
Nov 28, 2023
Merged

Conversation

aisk
Copy link
Member

@aisk aisk commented Nov 14, 2023

The string field will be encoded into python string (formerly unicode in python2) via UTF-8 encoding, and if the field is not a valid UTF-8 string, current implementatiion will fallback to return the raw bytes.

As a result, there would be a pitfall for users: if someone don't validate the result type and just pass it to database or somewhere else, there would be a dirty data.

This change add a strict_decode flag when create protocol (or protocol factory) to progate the UnicodeDecodeError to users so the PRC call will be failed as expected.

The flag is not enabled by default for compatibility issue.

Protocols needs support:

  • compat
  • binary
  • cybin
  • aio-compat
  • aio-binary

Summary by CodeRabbit

  • New Features

    • Introduced a new strict decoding mode for string values across various protocols, enhancing error handling and data integrity.
  • Tests

    • Added tests to ensure strict decoding raises appropriate errors when encountering invalid data.
    • Implemented new asynchronous tests for binary and compact protocols.
  • Bug Fixes

    • Adjusted string decoding to correctly raise exceptions in strict mode when decoding errors occur.

@aisk aisk marked this pull request as draft November 14, 2023 09:49
@aisk aisk marked this pull request as ready for review November 14, 2023 10:02
@aisk aisk requested a review from ethe November 14, 2023 10:27
@aisk
Copy link
Member Author

aisk commented Nov 27, 2023

@coderabbitai review

Copy link

coderabbitai bot commented Nov 27, 2023

Walkthrough

The changes across various files introduce a strict_decode parameter to functions and classes related to binary and compact protocols within a Python library, likely for handling Thrift serialization. This parameter enables strict decoding mode, where decoding errors result in exceptions. New test functions are added to ensure that enabling strict_decode leads to the expected UnicodeDecodeError when decoding operations encounter errors.

Changes

File Path Change Summary
tests/test_aio_protocol_binary.py
tests/test_aio_protocol_compact.py
tests/test_protocol_binary.py
tests/test_protocol_compact.py
tests/test_protocol_cybinary.py
Added new test functions test_strict_decode to verify behavior when strict_decode is True.
thriftpy2/contrib/aio/protocol/binary.py
thriftpy2/contrib/aio/protocol/compact.py
thriftpy2/protocol/binary.py
thriftpy2/protocol/compact.py
thriftpy2/protocol/cybin/cybin.pyx
Introduced strict_decode parameter to various functions and classes to control strict decoding behavior.

🐇✨
In the realm of code where the data streams flow,
A rabbit hopped in, with a strict_decode in tow.
"No more silent fails," it said with a cheer,
"For errors shall speak, whenever they're near!"
🎉🐰


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 977c6ca and 8d340f7.
Files selected for processing (10)
  • tests/test_aio_protocol_binary.py (1 hunks)
  • tests/test_aio_protocol_compact.py (1 hunks)
  • tests/test_protocol_binary.py (2 hunks)
  • tests/test_protocol_compact.py (2 hunks)
  • tests/test_protocol_cybinary.py (1 hunks)
  • thriftpy2/contrib/aio/protocol/binary.py (7 hunks)
  • thriftpy2/contrib/aio/protocol/compact.py (2 hunks)
  • thriftpy2/protocol/binary.py (7 hunks)
  • thriftpy2/protocol/compact.py (3 hunks)
  • thriftpy2/protocol/cybin/cybin.pyx (10 hunks)
Additional comments: 38
tests/test_aio_protocol_binary.py (1)
  • 27-33: The test test_strict_decode is well-constructed to verify that the UnicodeDecodeError is raised when strict decoding is enabled and an invalid UTF-8 sequence is encountered. The use of AsyncBytesIO to wrap BytesIO is appropriate for simulating asynchronous I/O in the test environment.
tests/test_aio_protocol_compact.py (1)
  • 26-33: The test test_strict_decode is well-structured and correctly tests the new strict_decode feature by expecting a UnicodeDecodeError when an invalid UTF-8 sequence is encountered. The use of AsyncBytesIO and TAsyncCompactProtocol is appropriate for the asynchronous context of the test.
tests/test_protocol_binary.py (3)
  • 5-5: The addition of the pytest import is appropriate for the new test cases that use it.

  • 100-100: The test case test_unpack_binary correctly checks the behavior of read_val when handling binary data without decoding.

  • 103-108: The test case test_strict_decode is well-designed to verify that read_val raises a UnicodeDecodeError when strict_decode is True and an invalid UTF-8 sequence is encountered.

tests/test_protocol_compact.py (2)
  • 120-126: The test test_strict_decode correctly checks for UnicodeDecodeError when strict_decode is set to True and an invalid UTF-8 sequence is encountered.

  • 117-131: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [2-129]

No further issues found in the rest of the file.

thriftpy2/contrib/aio/protocol/binary.py (9)
  • 73-77: The addition of the strict_decode parameter to the read_val function aligns with the summary and the pull request's intent to provide strict decoding options.

  • 104-109: The implementation of the strict_decode logic in the read_val function for string decoding is consistent with the pull request's intent and the summary provided.

  • 126-131: The read_val function is correctly called with the strict_decode parameter when reading lists, ensuring that the strict decoding behavior is propagated.

  • 158-165: The read_val function is correctly called with the strict_decode parameter when reading maps, ensuring that the strict decoding behavior is propagated.

  • 169-170: The read_struct function is correctly called with the strict_decode parameter, ensuring that the strict decoding behavior is propagated when reading structs.

  • 245-252: The TAsyncBinaryProtocol class constructor has been updated to accept and store the strict_decode parameter, which is in line with the changes described in the summary.

  • 274-276: The read_struct method in the TAsyncBinaryProtocol class correctly passes the strict_decode parameter, ensuring that the strict decoding behavior is used when reading structs.

  • 282-288: The TAsyncBinaryProtocolFactory class constructor has been updated to accept and store the strict_decode parameter, which is in line with the changes described in the summary.

  • 290-297: The get_protocol method in the TAsyncBinaryProtocolFactory class correctly passes the strict_decode parameter when creating a new TAsyncBinaryProtocol instance.

thriftpy2/contrib/aio/protocol/compact.py (2)
  • 145-150: The implementation of the strict_decode feature in the _read_string method is correct and aligns with the summary provided. It ensures that a UnicodeDecodeError is raised when strict_decode is True and an invalid UTF-8 sequence is encountered.

  • 308-318: The addition of the strict_decode parameter to the TAsyncCompactProtocolFactory class and its propagation to the TAsyncCompactProtocol constructor is consistent with the changes described in the summary and pull request. This ensures that the new strict_decode feature is available when creating protocol instances through the factory.

thriftpy2/protocol/binary.py (6)
  • 218-219: The read_val function has been correctly updated to include the strict_decode parameter in its signature, aligning with the changes described in the summary.

  • 249-254: The implementation of the strict_decode logic within the read_val function is correct. It attempts to decode the payload as UTF-8 and raises a UnicodeDecodeError if strict_decode is True and a decoding error occurs.

  • 316-316: The read_struct function has been correctly updated to include the strict_decode parameter in its signature, aligning with the changes described in the summary.

  • 387-394: The TBinaryProtocol class's __init__ method has been correctly updated to include the strict_decode parameter, and the parameter is stored as an instance variable as intended.

  • 423-428: The TBinaryProtocolFactory class's __init__ method has been correctly updated to include the strict_decode parameter, and the parameter is stored as an instance variable as intended.

  • 299-319: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [218-313]

Verify that the strict_decode parameter is being passed correctly to all recursive calls of read_val and read_struct within the thriftpy2 library, and ensure that the default value of strict_decode is set to False in all places to maintain backward compatibility.

thriftpy2/protocol/compact.py (3)
  • 132-139: The addition of the strict_decode parameter to the TCompactProtocol constructor is consistent with the summary and is implemented correctly.

  • 246-251: The changes to the _read_string method to raise a UnicodeDecodeError when strict_decode is True are consistent with the summary and correctly implemented.

  • 590-597: The addition of the strict_decode parameter to the TCompactProtocolFactory constructor and the get_protocol method is consistent with the summary and is implemented correctly.

thriftpy2/protocol/cybin/cybin.pyx (11)
  • 173-177: The addition of the strict_decode parameter to the read_struct function with a default value of False is correctly implemented and maintains backward compatibility.

  • 256-258: The addition of the strict_decode parameter to the c_read_string function with a default value of False is correctly implemented and maintains backward compatibility.

  • 267-269: The addition of the strict_decode parameter to the c_read_val function with a default value of False is correctly implemented and maintains backward compatibility.

  • 440-442: The addition of the strict_decode parameter to the read_val function with a default value of False is correctly implemented and maintains backward compatibility.

  • 451-462: The addition of the strict_decode parameter to the TCyBinaryProtocol class with a default value of False is correctly implemented and maintains backward compatibility.

  • 524-529: The addition of the strict_decode parameter to the TCyBinaryProtocolFactory class with a default value of False is correctly implemented and maintains backward compatibility.

  • 259-264: The use of the strict_decode parameter within the c_read_string function correctly implements the intended behavior of raising a UnicodeDecodeError when strict_decode is True and a decoding error occurs.

  • 203-204: The strict_decode parameter is correctly passed to the c_read_val function within the read_struct function, ensuring that the strict decoding behavior is applied when reading struct fields.

  • 298-299: The strict_decode parameter is correctly passed to the c_read_string function within the c_read_val function when the type is T_STRING, ensuring that the strict decoding behavior is applied when reading string values.

  • 508-510: The strict_decode parameter is correctly used in the read_struct method of the TCyBinaryProtocol class, ensuring that the strict decoding behavior is applied when reading messages.

  • 531-534: The strict_decode parameter is correctly used in the get_protocol method of the TCyBinaryProtocolFactory class, ensuring that the strict decoding behavior is applied when creating new protocol instances.

Comment on lines +150 to +155
def test_strict_decode():
bs = TCyMemoryBuffer(b"\x00\x00\x00\x0c\x00" # there is a redundant '\x00'
b"\xe4\xbd\xa0\xe5\xa5\xbd\xe4\xb8\x96\xe7\x95\x8c")
with pytest.raises(UnicodeDecodeError):
proto.read_val(bs, TType.STRING, decode_response=True,
strict_decode=True)
Copy link

@coderabbitai coderabbitai bot Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_strict_decode correctly checks for a UnicodeDecodeError when strict_decode is set to True. However, it might be beneficial to also verify the content of the error message to ensure that it matches the expected error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aisk Understood, we'll proceed without adding a check for the error message content.

🐰

@ethe ethe merged commit fc26372 into Thriftpy:master Nov 28, 2023
7 checks passed
@aisk aisk deleted the strict-decode branch November 28, 2023 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants