Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Writing to bytes TextIO stream in StreamWriter causing AttributeError on buffer member #6034

Merged

Conversation

sbchisholm
Copy link
Contributor

Which issue(s) does this change fix?

#6033

Why is this change necessary?

Fixes the bug raise in #6033 where the instance of the stream member in StreamWriter does not have the buffer attribute.

How does it address the issue?

Rather than using the buffer attribute to write bytes, the bytes are converted to a string first with decode and then written with the write method (which accepts strings)

What side effects does this change have?

Unsure.

Mandatory Checklist

PRs will only be reviewed after checklist is complete

  • Add input/output type hints to new functions/methods
  • Write design document if needed (Do I need to write a design document?)
  • Write/update unit tests
  • Write/update integration tests
  • Write/update functional tests if needed
  • make pr passes
  • make update-reproducible-reqs if dependencies were changed
  • Write documentation

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

 - The TextIO's binary buffer member is not guaranteed to exist
   according to the python docs.
   (https://docs.python.org/3.8/library/io.html#io.TextIOBase.buffer)
 - This change decodes the bytes to a string so that we can use
   the write method on the TextIO stream.
@sbchisholm sbchisholm requested a review from a team as a code owner October 4, 2023 15:12
@github-actions github-actions bot added pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels Oct 4, 2023
@wolverian
Copy link

wolverian commented Oct 6, 2023

Looking at the users of this function, for example

output_data = output_data.decode("utf-8").replace("\r", os.linesep).encode("utf-8")
if isinstance(output_stream, StreamWriter):
output_stream.write_bytes(output_data)
and
self._stream_writer.write_bytes(cast(str, remote_invoke_response.response).encode())
would it make sense to remove this function and change the calls to use write_str instead?

@sbchisholm
Copy link
Contributor Author

Looking at the users of this function, for example

output_data = output_data.decode("utf-8").replace("\r", os.linesep).encode("utf-8")
if isinstance(output_stream, StreamWriter):
output_stream.write_bytes(output_data)

and

self._stream_writer.write_bytes(cast(str, remote_invoke_response.response).encode())

would it make sense to remove this function and change the calls to use write_str instead?

If that's true, then that would make sense but it looks to also be used in here:

elif isinstance(response, bytes):
stdout.write_bytes(response)

@wolverian
Copy link

wolverian commented Oct 6, 2023

Sure, but does that make sense? In that case, if this PR is merged, the bytes are decoded to a string anyway, it's just hidden from the caller and could surprise them if they assumed the bytes wouldn't be interpreted in any specific encoding.

Rather, in my opinion, either this abstraction should offer a way to write raw bytes with no interpretation (decoding) or it should only offer a way to write strings.

Hiding these bytes to string conversions will shoot the user in the foot eventually.

@sbchisholm
Copy link
Contributor Author

Sure, but does that make sense? In that case, if this PR is merged, the bytes are decoded to a string anyway, it's just hidden from the caller and could surprise them if they assumed the bytes wouldn't be interpreted in any specific encoding.

Rather, in my opinion, either this abstraction should offer a way to write raw bytes with no interpretation (decoding) or it should only offer a way to write strings.

Hiding these bytes to string conversions will shoot the user in the foot eventually.

That's a great point, I'll update the PR.

@sriram-mv sriram-mv requested review from jfuss and mildaniel October 9, 2023 18:38
@sriram-mv
Copy link
Contributor

I'm adding both @jfuss and @mildaniel as reviewers given their previous experience in bytes/str operation.

My take is that we are explicitly changing behavior from bytes to str and there could be side-effects. There was an explicit PR to decode - replace carriage returns - reencode. That behavior seems to be removed at this point.

@wolverian
Copy link

wolverian commented Oct 11, 2023

My take is that we are explicitly changing behavior from bytes to str and there could be side-effects. There was an explicit PR to decode - replace carriage returns - reencode. That behavior seems to be removed at this point.

It might be a good idea to bisect which commit triggered this bug. That might help with figuring out what the right thing to do here is. FWIW, I think this PR looks logical, but I'm unfamiliar with this codebase and its history.

This is fairly high in priority in my opinion as I simply can't use the latest version SAM CLI right now due to this bug.

@mildaniel
Copy link
Contributor

I did some digging here and it seems as though the issue is that the StreamWriter class expects the stream to be of type TextIO, but there are times where an io.StringIO is passed in https://github.com/aws/aws-sam-cli/blob/develop/samcli/local/lambda_service/local_lambda_invoke_service.py#L165.

io.StringIO is not of type TextIO and does not have a buffer property that StreamWriter expects of a TextIO object. I'm not sure why this wasn't picked up by any type checking, but in any case, causes this exception whenever it's called with an io.StringIO stream. Changing the instantiators of the StreamWriter class to all use TextIO would require pretty significant refactoring since those instances also use methods and properties unique to io.StringIO.

Although this PR is more of a solution to the symptom rather than root cause, I'm inclined to go with it so I don't really see any negatives here and it allows us to patch an important issue. Another solution is to update the type in the StreamWriter class to be a union of StreamWriter, io.TextIO and io.TextIOWrapper and handle each one in the implementation of the write method.

@sriram-mv what are your thoughts here?

@lucashuy lucashuy removed the stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. label Oct 12, 2023
@moelasmar moelasmar added this pull request to the merge queue Nov 7, 2023
Merged via the queue into aws:develop with commit 70b34aa Nov 7, 2023
55 checks passed
@sbchisholm sbchisholm deleted the fix/decode-bytes-for-write-to-TextIO-stream branch November 7, 2023 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants