-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on sync behaviour #105
Comments
Hm, that doesn't seem entirely optimal, there ought to be a cleaner way to ensure this. I'm not entirely sure whether that specific behavior was intentional and the idea was that one would re-call with the same parameter if everything was not written, or if that's an oversight and it should store that it has been called with flush. Ideally we would want a cleaner/better api in general though. |
I guessed it might have been inherited behaviour from the original implementation, so in that case best thing to do is just to document it. But certainly if the API (or a new API) can be made cleaner that would be ideal. Also, if considering a new API, it needs to stay low-level, i.e. to have the ability to just move data between two buffers as data comes in. Other crates only provide Write as an interface, which is really inconvenient. |
Totally forgot to reply again.. Yeah it's supposed to stay low level, leaving writer/reader and stuff to flate2. I'm not sure what the best way to design a proper API is though, personally I've been busy with other projects so haven't looked into it much. |
Ok, after some digging it looks the current behaviour may be what zlib does for the corresponding deflate call (I presume C miniz would behave the same as it's supposed to be compatible with the zlib API, though it doesn't have much documentation and the code is not particularly easy to read.) That is, the caller needs to check that the output buffer is large enough to contain the flush. The flush modes are not very widely documented in general though so it's all a bit confusing. I do think we could add some functionality that stores whether a flush was output or not though as an addition though, maybe using a separate flush parameter or something, ideally we would want the API designed to prevent any erroneous ways of calling the functions that don't produce the expected output. |
I think the easiest solution is to work around it, i.e. always have a big-enough buffer. If the docs said "always use a large buffer when flushing sync points", then I would have avoided the trouble. (Also document how big exactly -- I'm guessing slightly larger than 32K, for the case of incompressible data.) Since I was streaming over a low-bandwidth medium, I was using smaller buffers than that. So easiest solution would be to document it. But if you want to improve the API, then yes I could have worked around the problem if there was some way to detect whether the sync sequence had been output or not, i.e. whether the flush+sync had completed. Then I could keep calling it with That does however push the responsibility onto the coder to remember to keep calling with |
I'm using miniz_oxide for streaming compression/decompression. I need reliable sync points, which mark the ends of records. I have it working now (and I've fuzzed my wrapper too, to be sure), but it took a few attempts. My observations are:
TDEFLFlush::Sync
then it does a flush and a sync, but only if it will all fit in the output buffer, otherwise the sync sequence will be missing (as expected)TDEFLFlush::Sync
, and there is still data to write, then the flush will continue, but the sync sequence won't be written, so the other end will not have complete data for the end of the record. So it appears that the flush is sticky, but the sync isn't.TDEFLFlush::Sync
, then if I've already written a sync sequence on the first call, it will write another one on this call. (But I don't want two sync sequences.)So my solution is to give it a big buffer to write to so that both the flush and the sync can be written in one go. I hope this takes care of things (it did in testing), but in the unlikely case that it didn't, if it filled that buffer then I write another one with
TDEFLFlush::Sync
. I'd get two syncs in that case, but it's better than my protocol locking up.Since this appears to work reliably in my fuzz-testing, I'm going with this. However it would be good to document this behaviour just in case someone else needs to do the same.
The text was updated successfully, but these errors were encountered: