Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cohttp EOF read #1035

Open
wants to merge 2 commits into
base: v5-backports
Choose a base branch
from

Conversation

gabrielmoise17
Copy link

This work has been done together by the following people:
@savvadia
@vect0r-vicall
@picojulien
@johnyob
@raphael-proust

Motivation

We have discovered that there is a resource leak for particular cases. Here are two possible scenarios:

  • If the callback never resolves, EOF is never detected, leading to a resource leak.
  • If the callback is very slow and resource-intensive, EOF is not detected until the callback resolves. Consequently, client resources are wasted even after the connection has been closed by the peer.

Solution

A new wait_eof_or_closed function is introduced, which repeatedly peeks (using MSG_PEEK) into the data stream for an EOF, in which case it closes the connection. MSG_PEEK used for this does not disturb reading from the stream by the handle_client. If the connection is closed locally before this happens, the function stops waiting for EOF.

If provided to the wait_eof_or_closed, the sleep_fn argument will be used to dictate the periodicity of the checks for EOF from the peer, yielding control periodically. If sleep_fn is not provided, the server keeps its legacy behaviour.

PR format

The format of the PR is as follows:

  • The first [WIP] commit adds a test_leak.ml test, which showcases the EOF leak; the [WIP] tag highlights that this test is not something that we want to be added necessarily, but we would be happy if you find it desirable or helpful and choose to incorporate it
  • The second commit fixes the leak showcased in the first commit

Testing

You can observe the initial leaking behaviour by cherry picking only the first commit:

$ g checkout v5-backports

$ g cherry-pick <first_commit>

$ make

$ dune exec ./_build/default/cohttp-lwt-unix/test/test_leak.exe
Server running on port 8080

Now, the server is listening, so you can call on a separate console curl -s 'localhost:8080/sleep' and observe that the server keeps printing messages of I slept. Now, when you close the curl request with CTRL+C, the server does not stop.

To solve this problem, add the second commit as well, and re-do this test, and after stopping the curl request, you will see this:

$ dune exec ./_build/default/cohttp-lwt-unix/test/test_leak.exe
Server running on port 8080           
Cohttp connection on 1
I slept 
I slept 
I slept 
I slept 
Cohttp connection closed

So, the leak is now fixed.

One can also monitor the resources used by the executable thanks to a tool such as lsof:

$ lsof -c test_leak.exe | grep localhost

@@ -76,4 +76,6 @@ module Make (Channel : Mirage_channel.S) = struct
| Read_exn e -> Lwt.return_error (Read_error e)
| Write_exn e -> Lwt.return_error (Write_error e)
| ex -> Lwt.fail ex)

let wait_eof_or_closed _conn _ic _sleep_fn = assert false
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we ask for the advice of the maintainers of the library, as we advocate that this function will never be called from this library, but we want to receive a second opinion.

@raphael-proust
Copy link
Contributor

Ping @rgrinberg and @mefyl who might be able to review.

@mefyl
Copy link
Contributor

mefyl commented Jul 15, 2024

Thank you for the detailed report and reproduction case!

TLDR: this is IMO more a design choice than a bug. But it's a good design question.

There is a preliminary question to all this: is it a good idea to kill the server request handler if the client disconnects? I'm not really sure it is, as one could have some heavy processing server side after sending the response headers, and abruptly killing this processing if the client disconnects before the end of the body is a very opinionated choice. In the test example, the "resource leak" mentioned exists because the handler never ever returns, so the handler is a user-created resource leak in itself.

Now I do reckon that your point has merit, and stopping processing of a request when the client disconnects is a desirable feature in some if not most cases - it's just that failing to do so is a sub-optimal while doing so inappropriately is a straight bug. With that in mind I think it would be better to provide the expert request handler with the possibility to monitor client disconnection, and maybe provide some simpler default behavior in the simple interface. Happy to discuss this.

@gabrielmoise17
Copy link
Author

Thank you for the detailed report and reproduction case!

TLDR: this is IMO more a design choice than a bug. But it's a good design question.

There is a preliminary question to all this: is it a good idea to kill the server request handler if the client disconnects? I'm not really sure it is, as one could have some heavy processing server side after sending the response headers, and abruptly killing this processing if the client disconnects before the end of the body is a very opinionated choice. In the test example, the "resource leak" mentioned exists because the handler never ever returns, so the handler is a user-created resource leak in itself.

Now I do reckon that your point has merit, and stopping processing of a request when the client disconnects is a desirable feature in some if not most cases - it's just that failing to do so is a sub-optimal while doing so inappropriately is a straight bug. With that in mind I think it would be better to provide the expert request handler with the possibility to monitor client disconnection, and maybe provide some simpler default behavior in the simple interface. Happy to discuss this.

Thank you for your response!
We agree that there is motivation behind not defaulting to closing the connection automatically when the client disconnects and EOF is received. We therefore propose to have one more optional callback to configure this behaviour.
In cohttp-lwt/src/server.ml we proposed to add sleep_fn optional callback to check for EOF periodically.
We can add another one, callback_on_eof which will be triggered if EOF is detected. The default behaviour (if sleep_fn is provided and thus cohttp is configured to look for EOF) is to cancel the promise.
So the whole solution will be the following:

  • if the client wants to wait for EOF while handling the request, sleep_fn is provided
    • if the client wants to interrupt request handling on EOF, callback_on_eof is not provided and cohttp uses Lwt.pick waiting for EOF
    • if the client provides callback_on_eof then cohttp waits for EOF in Lwt.choose which will not interrupt the promise execution. If EOF is received, the callback is called and then it's up to the client what to do with the main handler

The new callback should accept conn in the same way as callback to identify the connection:

  type t = {
    callback : conn -> Cohttp.Request.t -> Body.t -> response_action Lwt.t;
    conn_closed : conn -> unit;
    sleep_fn : (unit -> unit Lwt.t) option;
    callback_on_eof : conn -> unit;
  } 

Please let us know what you think, we would be happy to chat about it. We can also showcase this adaptation in a fixup commit, if that is clearer for you. Would you prefer to continue this discussion in mails/comments here, or would it be easier for you to have a call?

@saroupille
Copy link

@art-w Do you know how we can move forward on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants