Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should StreamingDecoder::read_exact prefer Read::read_exact's contract over its own? #69

Closed
workingjubilee opened this issue Sep 13, 2024 · 4 comments

Comments

@workingjubilee
Copy link
Contributor

You of course have already seen rust-lang/backtrace-rs#626 and I figured I'd capture some thoughts about the discussion there.

I won't try to persuade you that StreamingDecoder::read should behave differently in the general case! But it would seem more natural, to me, for the Read::read_exact case to have more of a "do what I mean" quality to it, even though this contradicts the documentation of StreamingDecoder that says:

StreamingDecoder expects the underlying stream to only contain a single frame, yet the specification states that a single archive may contain multiple frames.

To decode all the frames in a finite stream, the calling code needs to recreate the instance of the decoder and handle crate::frame::ReadFrameHeaderError::SkipFrame errors by skipping forward the length amount of bytes, see #57

Personally, I tend to dislike read_exact for the precise reason that it seems like a much more "magical" function to me than read, as it says "Reads the exact number of bytes required to fill buf. This function reads as many bytes as necessary to completely fill the specified buffer buf." and is very prone to leaving the buffer and reader in unspecified states, considering:

If this function encounters an “end of file” before completely filling the buffer, it returns an error of the kind ErrorKind::UnexpectedEof. The contents of buf are unspecified in this case.

If any other read error is encountered then this function immediately returns. The contents of buf are unspecified in this case.

If this function returns an error, it is unspecified how many bytes it has read, but it will never read more than would be necessary to completely fill the buffer.

Naturally of course there could be another Decoder type that implements such semantics, but would it really differ, if you had your way, on more than this one function?

@philipc
Copy link
Contributor

philipc commented Sep 14, 2024

I think read_to_end should do this too. Decoding only one frame for read is fine, but read_exact and read_to_end both have semantics that imply multiple potential calls to read.

The motivation given in #57 (comment) is that a TCP stream is potentially endless, but read_exact and read_to_end work as expected for a std::net::TcpStream, so I don't see why it should be any different here.

It is quite unergonomic to need to do https://github.com/gimli-rs/object/pull/730/files for what to me is a normal use case.

@philipc
Copy link
Contributor

philipc commented Sep 14, 2024

I had a look at implementing a version of StreamingDecoder that reads multiple frames, but it didn't feel right. (e.g. it ideally needs Seek for skipping frames, and I didn't find a clean way to determine EOF without trying to read another frame header).

I think what I really want is the equivalent of flate2's decompress and decompress_vec. Would you accept a PR adding those?

@KillingSpark
Copy link
Owner

The FrameDecoder has an interface similar to these functions.

It might also be better to just hide the whole complexity behind a convenience function like decode_all from the zstd crate.

If this documents the risks behind using this function, maybe offering a variant that doesn't return a vec but writes to a slice (or a generic Write) I'd be fine with that.

@KillingSpark
Copy link
Owner

Should be dealt with by #70 ? Please reopen if there is anything still missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants