-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add 'strict_decode' option for protocols #227
Conversation
@coderabbitai review |
WalkthroughThe changes across various files introduce a Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (10)
- tests/test_aio_protocol_binary.py (1 hunks)
- tests/test_aio_protocol_compact.py (1 hunks)
- tests/test_protocol_binary.py (2 hunks)
- tests/test_protocol_compact.py (2 hunks)
- tests/test_protocol_cybinary.py (1 hunks)
- thriftpy2/contrib/aio/protocol/binary.py (7 hunks)
- thriftpy2/contrib/aio/protocol/compact.py (2 hunks)
- thriftpy2/protocol/binary.py (7 hunks)
- thriftpy2/protocol/compact.py (3 hunks)
- thriftpy2/protocol/cybin/cybin.pyx (10 hunks)
Additional comments: 38
tests/test_aio_protocol_binary.py (1)
- 27-33: The test
test_strict_decode
is well-constructed to verify that theUnicodeDecodeError
is raised when strict decoding is enabled and an invalid UTF-8 sequence is encountered. The use ofAsyncBytesIO
to wrapBytesIO
is appropriate for simulating asynchronous I/O in the test environment.tests/test_aio_protocol_compact.py (1)
- 26-33: The test
test_strict_decode
is well-structured and correctly tests the newstrict_decode
feature by expecting aUnicodeDecodeError
when an invalid UTF-8 sequence is encountered. The use ofAsyncBytesIO
andTAsyncCompactProtocol
is appropriate for the asynchronous context of the test.tests/test_protocol_binary.py (3)
5-5: The addition of the
pytest
import is appropriate for the new test cases that use it.100-100: The test case
test_unpack_binary
correctly checks the behavior ofread_val
when handling binary data without decoding.103-108: The test case
test_strict_decode
is well-designed to verify thatread_val
raises aUnicodeDecodeError
whenstrict_decode
isTrue
and an invalid UTF-8 sequence is encountered.tests/test_protocol_compact.py (2)
120-126: The test
test_strict_decode
correctly checks forUnicodeDecodeError
whenstrict_decode
is set toTrue
and an invalid UTF-8 sequence is encountered.117-131: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [2-129]
No further issues found in the rest of the file.
thriftpy2/contrib/aio/protocol/binary.py (9)
73-77: The addition of the
strict_decode
parameter to theread_val
function aligns with the summary and the pull request's intent to provide strict decoding options.104-109: The implementation of the
strict_decode
logic in theread_val
function for string decoding is consistent with the pull request's intent and the summary provided.126-131: The
read_val
function is correctly called with thestrict_decode
parameter when reading lists, ensuring that the strict decoding behavior is propagated.158-165: The
read_val
function is correctly called with thestrict_decode
parameter when reading maps, ensuring that the strict decoding behavior is propagated.169-170: The
read_struct
function is correctly called with thestrict_decode
parameter, ensuring that the strict decoding behavior is propagated when reading structs.245-252: The
TAsyncBinaryProtocol
class constructor has been updated to accept and store thestrict_decode
parameter, which is in line with the changes described in the summary.274-276: The
read_struct
method in theTAsyncBinaryProtocol
class correctly passes thestrict_decode
parameter, ensuring that the strict decoding behavior is used when reading structs.282-288: The
TAsyncBinaryProtocolFactory
class constructor has been updated to accept and store thestrict_decode
parameter, which is in line with the changes described in the summary.290-297: The
get_protocol
method in theTAsyncBinaryProtocolFactory
class correctly passes thestrict_decode
parameter when creating a newTAsyncBinaryProtocol
instance.thriftpy2/contrib/aio/protocol/compact.py (2)
145-150: The implementation of the
strict_decode
feature in the_read_string
method is correct and aligns with the summary provided. It ensures that aUnicodeDecodeError
is raised whenstrict_decode
isTrue
and an invalid UTF-8 sequence is encountered.308-318: The addition of the
strict_decode
parameter to theTAsyncCompactProtocolFactory
class and its propagation to theTAsyncCompactProtocol
constructor is consistent with the changes described in the summary and pull request. This ensures that the newstrict_decode
feature is available when creating protocol instances through the factory.thriftpy2/protocol/binary.py (6)
218-219: The
read_val
function has been correctly updated to include thestrict_decode
parameter in its signature, aligning with the changes described in the summary.249-254: The implementation of the
strict_decode
logic within theread_val
function is correct. It attempts to decode the payload as UTF-8 and raises aUnicodeDecodeError
ifstrict_decode
isTrue
and a decoding error occurs.316-316: The
read_struct
function has been correctly updated to include thestrict_decode
parameter in its signature, aligning with the changes described in the summary.387-394: The
TBinaryProtocol
class's__init__
method has been correctly updated to include thestrict_decode
parameter, and the parameter is stored as an instance variable as intended.423-428: The
TBinaryProtocolFactory
class's__init__
method has been correctly updated to include thestrict_decode
parameter, and the parameter is stored as an instance variable as intended.299-319: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [218-313]
Verify that the
strict_decode
parameter is being passed correctly to all recursive calls ofread_val
andread_struct
within thethriftpy2
library, and ensure that the default value ofstrict_decode
is set toFalse
in all places to maintain backward compatibility.thriftpy2/protocol/compact.py (3)
132-139: The addition of the
strict_decode
parameter to theTCompactProtocol
constructor is consistent with the summary and is implemented correctly.246-251: The changes to the
_read_string
method to raise aUnicodeDecodeError
whenstrict_decode
isTrue
are consistent with the summary and correctly implemented.590-597: The addition of the
strict_decode
parameter to theTCompactProtocolFactory
constructor and theget_protocol
method is consistent with the summary and is implemented correctly.thriftpy2/protocol/cybin/cybin.pyx (11)
173-177: The addition of the
strict_decode
parameter to theread_struct
function with a default value ofFalse
is correctly implemented and maintains backward compatibility.256-258: The addition of the
strict_decode
parameter to thec_read_string
function with a default value ofFalse
is correctly implemented and maintains backward compatibility.267-269: The addition of the
strict_decode
parameter to thec_read_val
function with a default value ofFalse
is correctly implemented and maintains backward compatibility.440-442: The addition of the
strict_decode
parameter to theread_val
function with a default value ofFalse
is correctly implemented and maintains backward compatibility.451-462: The addition of the
strict_decode
parameter to theTCyBinaryProtocol
class with a default value ofFalse
is correctly implemented and maintains backward compatibility.524-529: The addition of the
strict_decode
parameter to theTCyBinaryProtocolFactory
class with a default value ofFalse
is correctly implemented and maintains backward compatibility.259-264: The use of the
strict_decode
parameter within thec_read_string
function correctly implements the intended behavior of raising aUnicodeDecodeError
whenstrict_decode
isTrue
and a decoding error occurs.203-204: The
strict_decode
parameter is correctly passed to thec_read_val
function within theread_struct
function, ensuring that the strict decoding behavior is applied when reading struct fields.298-299: The
strict_decode
parameter is correctly passed to thec_read_string
function within thec_read_val
function when the type isT_STRING
, ensuring that the strict decoding behavior is applied when reading string values.508-510: The
strict_decode
parameter is correctly used in theread_struct
method of theTCyBinaryProtocol
class, ensuring that the strict decoding behavior is applied when reading messages.531-534: The
strict_decode
parameter is correctly used in theget_protocol
method of theTCyBinaryProtocolFactory
class, ensuring that the strict decoding behavior is applied when creating new protocol instances.
def test_strict_decode(): | ||
bs = TCyMemoryBuffer(b"\x00\x00\x00\x0c\x00" # there is a redundant '\x00' | ||
b"\xe4\xbd\xa0\xe5\xa5\xbd\xe4\xb8\x96\xe7\x95\x8c") | ||
with pytest.raises(UnicodeDecodeError): | ||
proto.read_val(bs, TType.STRING, decode_response=True, | ||
strict_decode=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_strict_decode
correctly checks for a UnicodeDecodeError
when strict_decode
is set to True
. However, it might be beneficial to also verify the content of the error message to ensure that it matches the expected error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aisk Understood, we'll proceed without adding a check for the error message content.
🐰
The string field will be encoded into python string (formerly unicode in python2) via UTF-8 encoding, and if the field is not a valid UTF-8 string, current implementatiion will fallback to return the raw bytes.
As a result, there would be a pitfall for users: if someone don't validate the result type and just pass it to database or somewhere else, there would be a dirty data.
This change add a
strict_decode
flag when create protocol (or protocol factory) to progate theUnicodeDecodeError
to users so the PRC call will be failed as expected.The flag is not enabled by default for compatibility issue.
Protocols needs support:
Summary by CodeRabbit
New Features
Tests
Bug Fixes