-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP 1.1 connections not being reused #3350
Comments
Yes, though it's worth noting that Cloudfront does speak HTTP2. That's my recommendation and what other people are doing today with ostree backed by S3.
Not S3 specifically, but an object store yes. It's worth noting that I think every other object store does, e.g. Azure Blob storage does. For example today when you go to a podman release and download a binary that's a redirect to Azure Blob:
We can investigate this but it's really worth emphasizing that I think HTTP2 is really worth pushing for... Can you try replicating what happens with just a plain Also of important note is that zstd:chunked which was created to get a similar "dynamic per-object delta fetch" for containers also can end up doing a lot of HTTP requests. |
Well it sounds like a bug to me, at least identifying the issue is worth it, I am not saying let’s invest into fixing it too much.
Not sure how tho, It first follows all the redirects and add them to the back of the queue, during this phase because no data is being downloaded just HTTP headers the "speed" is reported as really bad around 200 B/s in my case. What is worth nothing is that
Then it switches to the download phase where it downloads all the files from S3 at the acceptable throughtput. Connections are also reused, tho from time to time (every few seconds) a connection is closed perhaps on the AWS side
Conclusions so far:
For the record, this is how I stored the list of URLs:
And this is how I downloaded the contents:
|
For the record, I conducted a test with redirects over HTTP/2 and haven’t experienced any HTTP closed connections. The whole transaction ran at a very good speed over a single TCP HTTP/2 connection. |
It does. If you want to download a bunch of URLs with curl, most convenient is using So I think we should be able to reproduce the behavior with the curl binary outside of ostree in this scenario by manually passing the objects to fetch, right? Let's not involve |
Yeah, I knew I could expand shell like that but it did not occur to me to use bigger files. So here is an example of downloading few big files from EPEL9 main Fedora download site. With
Now, when I try to use
It even reports "live=3" and "xfers=10":
I tried both
My conclusion so far:
|
So the hostname Pulp returns is always the same, it resolves randomly in my region:
I tried to create an entry in |
So One thing here that seems quite possible is that It's also possible (but IMO unlikely) that there is some logic specifically in the
https://curl.se/libcurl/c/CURLMOPT_MAX_HOST_CONNECTIONS.html documents the behavior there and it's pretty clear what it does...I don't think it's the problem here. One thing to remember here, I don't know if you hit this - if you're changing the ostree codebase, it's not enough to just update One discovery here is that curl/curl@1be704e changes a bunch of the relevant logging strings here - I was confused why using GH code search I couldn't find any of the messages. And so if debugging the curl code specifically it will be important to look at the specific libcurl git tag/commit. |
Ostree does not appear to use this value tho, it is just this:
Ha good point, thanks for sharing. I will validate, what I did was just dumb:
So let me see:
Looks good to me. Anyways, I have noticed that HTTP redirect is not causing this, it looks like HTTP 1.1 connections are not kept alive even against plain and simple HTTP server. Looks like a broader bug to me, unless I am doing something terribly wrong. Maybe see yourself, I have prepared two repos, one without any redirects:
One with redirect for each and every file:
Watching new TCP connections is perhaps the most comfortable with this command:
Now, I know TCP needs two parties and it can be server who is closing those connections, but this is pretty standard Apache httpd from Fedora 41, there is no rate limiting or Options I tried to increase:
|
I just wanted to chime in here as an interested party. I'm involved in the server-side hosting of ostree content. I hear you on the HTTP/2 desire and we are pursuing that. Separately though, not having HTTP 1.1 connection reuse does seem like an issue when a pull initiates like 44K TCP connections. Can someone confirm for me, is the current thinking that the issue is in libcurl? |
My suggestion is basically:
While yes the Apache docs do indicate KeepAlive is on by default, maybe we patched it downstream, etc. I also did some debugging of related issues in the past by comparing the libostree curl settings with https://github.com/rpm-software-management/librepo FWIW. |
Hello,
we host repositories behind a MTLS HTTP/1.1 gateway which redirects all requests to AWS S3 (HTTP 1.1 as well). There is no HTTP/2 support for either of the services at the moment. I was wondering if those HTTP 1.1 connections are being reused and I would need some help understanding if that is the case:
So first couple of connections (6) the libcurl indicates a connection being reused both to the gateway and the S3. But after that, the rest of the whole transaction looks like this:
And the connection number is increasing which to me it looks like for some reason libcurl is not reusing them. I thought for a moment that the hard limit of 8 connections is the limiting factor, but I increased it or completely removed it (defaults to no limit) and the behavior is the same.
Pulp does support deltas and we will likely end up doing that, I just want to understand what is going on. From the docs, it looks like the "dumb protocol" design was created with AWS S3 in mind, but this service does not support HTTP/2 to this day.
Can you help me understanding what is wrong? Maybe I am just mis-reading the logs, but output of
lsof
confirms that new and new TCP connections are being made. Alsopmrep network.tcp.activeopens
shows new TCP connections being opened at a rate of about 15 per second on my system. When paired with MTLS the performance is down to zero. Thanks for help.Switching to HTTP/2 protocol completely mitigates the problem, the performance over HTTP/2 is very good. We currently have HTTP 1.1 only infrastructure to work with that is why I filed it. I just wanted to let others now after I spent some time on this.
The text was updated successfully, but these errors were encountered: