Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete content retrieval on HTTP HEAD requests puts an unnecessary burden on tds #124

Open
cskarby opened this issue Mar 24, 2021 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@cskarby
Copy link
Contributor

cskarby commented Mar 24, 2021

According to https://tools.ietf.org/html/rfc7231#section-4.3.2 payload headers are optional for HTTP HEAD responses.

Content-Length is a payload header according to https://tools.ietf.org/html/rfc7231#section-3.3

In

HttpServletResponse httpServletResponse = (HttpServletResponse) response;
NoBodyResponseWrapper noBodyResponseWrapper = new NoBodyResponseWrapper(httpServletResponse);
chain.doFilter(new ForceGetRequestWrapper(httpServletRequest), noBodyResponseWrapper);
noBodyResponseWrapper.setContentLength();
a complete GET-request is processed to compute the Content-Length, and the HTTP body is discarded. This seems like a waste of resources, especially for large datasets, possibly spanning several files (e.g. via ncml aggregates). I think it is better to handle this explicitly by having a pair of functions: one to set the http headers (except for payload headers), and call this function from get functions, this way we can give swift responses back on HTTP HEAD requests (and save resources on the server side.)

@lesserwhirls lesserwhirls transferred this issue from Unidata/thredds Mar 24, 2021
@lesserwhirls
Copy link
Collaborator

lesserwhirls commented Mar 24, 2021

I think at the very least we can make it configurable. As you mention, there is a cost based on the size of the backing datasets, but the cost will vary based on service type as well. For example, the HTTPServer service (/thredds/fileServer/*) will only be accessing a single file, and returning the size of that file for HEAD requests is not too bad.

@lesserwhirls lesserwhirls self-assigned this Mar 24, 2021
@lesserwhirls lesserwhirls added the enhancement New feature or request label Mar 24, 2021
@lesserwhirls
Copy link
Collaborator

@ethanrd - what do you think?

@cskarby
Copy link
Contributor Author

cskarby commented Mar 25, 2021

I think at the very least we can make it configurable. As you mention, there is a cost based on the size of the backing datasets, but the cost will vary based on service type as well. For example, the HTTPServer service (/thredds/fileServer/*) will only be accessing a single file, and returning the size of that file for HEAD requests is not too bad.

I agree, but file size should probably come from the filesystem rather than opening the file and do a complete stream from disk to memory to count the bytes.

@lesserwhirls
Copy link
Collaborator

Indeed. I would grab file size from the File object for the HTTPServer service. The option to turn off Content-Length for the other services for HEAD requests is what I would target, since you can really only know those sizes by processing the request first.

@ethanrd
Copy link
Member

ethanrd commented Mar 29, 2021

Hi @cskarby - Looks like all TDS services except HTTPServer use HttpHeadFilter (according to applicationContext.xml). For HTTPServer, both GET and HEAD requests are handled by the same method because of Spring MVC defaults rather than the filter. HTTPServer does the right thing, getting size and last modified from the File object and, if it is a HEAD request, finishes without reading any bytes.

We could look at moving in a similar direction for the other TDS services (and make inclusion of Content-Length configurable). I don’t think there’s an across the board change to make this switch. It would take some work (though maybe not a lot) on each service to tease HEAD and GET apart.

Are you seeing performance issues with HEAD requests on particular TDS services?

@gaellafond
Copy link

I think at the very least we can make it configurable. As you mention, there is a cost based on the size of the backing datasets, but the cost will vary based on service type as well. For example, the HTTPServer service (/thredds/fileServer/*) will only be accessing a single file, and returning the size of that file for HEAD requests is not too bad.

Keep in mind that THREDDS now supports cloud file hosting, such as CDMS3 and CDMRemote. Calculating the file size is not always as simple as asking the file system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants