Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eio.Net.getaddrinfo_stream fails to resolve host after 1019 successful calls on linux #776

Open
serpent7776 opened this issue Oct 28, 2024 · 5 comments

Comments

@serpent7776
Copy link

I'm writing an application using cohttp-eio that repeatedly connects to a remote host every 3s. The initial 1019 requests succeed, but all the following requests fail to resolve the host. I can simplify the code to the following:

let connect ~sw ~env addr =
  ignore @@ Eio.Net.connect ~sw env#net addr

let print addr =
  Format.printf "@[%a@]@." Eio.Net.Sockaddr.pp addr

let connect_and_print ~sw ~env addr =
  print addr;
  (match addr with | `Tcp (a,_) -> Eio.Net.Ipaddr.fold ~v4:(fun _ -> connect ~sw ~env addr) ~v6:(fun _ -> ()) a | _ -> ())

let getaddrinfo ~sw ~env uri =
  List.iter (connect_and_print ~sw ~env) @@
  Eio.Net.getaddrinfo_stream ~service:"http" env#net
    (Uri.host_with_default uri)

let google_uri = Uri.of_string "http://google.com"

let rec loop ~sw ~env clock count =
  Format.printf "%d:@." count;
  getaddrinfo ~sw ~env google_uri ;
  loop ~sw ~env clock @@ (count + 1)

let () =
  Eio_main.run @@ fun env ->
  Eio.Switch.run @@ fun sw ->
  let clock = env#clock in
  loop ~sw ~env clock 0

This initially given the correct output

0:
tcp:142.250.186.206:80
tcp:[2a00:1450:401b:80d::200e]:80
1:
tcp:142.250.186.206:80
tcp:[2a00:1450:401b:80d::200e]:80

But after 1019 requests Eio.Net.getaddrinfo_stream returns no addresses:

1018:
tcp:142.250.186.206:80
tcp:[2a00:1450:401b:80d::200e]:80
1019:
1020:
1021:
1022:
1023:
1024:
1025:

I get the same behaviour every single time. I tested this on three different machines: Manjaro, Ubuntu and FreeBSD. It only reproduces on linux systems, so it's likely an issue in eio_linux.
Also, it reproduces only when Eio.Net.connect is called.

Possibly related #351

@patricoferris
Copy link
Collaborator

I think the problem is due to number of open connections. For example:

let connect ~sw ~env addr =
  let c = Eio.Net.connect ~sw env#net addr in
  Eio.Flow.close c

fixes this problem. Note with the use of the single switch these connections are never being closed (that may or may not be desired). Although I'm not quite sure why that is the case on Linux. But just thought I would send this comment to hopefully unblock you.

@serpent7776
Copy link
Author

Thanks, that's very helpful.
I can confirm that closing the connections manually fixes the issue. I also did a roughly similar code in C and it also has the same behaviour when connections are not closed. After 1021 connections, getaddrinfo returns error: Servname not supported for ai_socktype, so this is linux behaviour.

But now I'm not sure, if this is eio issue, cohttp issue or am I supposed to close the connections manually?

@talex5
Copy link
Collaborator

talex5 commented Oct 29, 2024

If you're using cohttp-eio, it will close the connection for you as soon as the switch goes out of scope. Just make sure you're not sharing a single switch for everything (like your example does) - see https://github.com/ocaml-multicore/eio?tab=readme-ov-file#switches-1.

The behaviour of getaddrinfo ignoring the error and returning an empty list is questionable, but comes from OCaml's Unix.getaddrinfo. It might be better to have our own version that passes errors on.

@serpent7776
Copy link
Author

Oh, I am using a single switch everywhere :)

@SGrondin
Copy link
Collaborator

@serpent7776 Switches are cheap, so don't hesitate to create more than one.
Ideally, you should have one Switch per "lifecycle".
If your DNS calls never overlap, then create one Switch per call so each call gets "cleaned up" as soon as possible.
If you have 2 different resources (let's say a file and a DNS call) and they both become "cleanable" (or "closeable") simultaneously, then you can use the same Switch for both.
I think it's often a good idea to create one Switch per "cleanup point". Resources that don't share the same cleanup point should not share the same Switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants