Complete content retrieval on HTTP HEAD requests puts an unnecessary burden on tds #124

cskarby · 2021-03-24T14:27:51Z

According to https://tools.ietf.org/html/rfc7231#section-4.3.2 payload headers are optional for HTTP HEAD responses.

Content-Length is a payload header according to https://tools.ietf.org/html/rfc7231#section-3.3

In

tds/tds/src/main/java/thredds/servlet/filter/HttpHeadFilter.java

Lines 41 to 45 in 987a79b

    
           HttpServletResponse httpServletResponse = (HttpServletResponse) response; 
        
           NoBodyResponseWrapper noBodyResponseWrapper = new NoBodyResponseWrapper(httpServletResponse); 
        
           chain.doFilter(new ForceGetRequestWrapper(httpServletRequest), noBodyResponseWrapper); 
        
           noBodyResponseWrapper.setContentLength();

a complete GET-request is processed to compute the Content-Length, and the HTTP body is discarded. This seems like a waste of resources, especially for large datasets, possibly spanning several files (e.g. via ncml aggregates). I think it is better to handle this explicitly by having a pair of functions: one to set the http headers (except for payload headers), and call this function from get functions, this way we can give swift responses back on HTTP HEAD requests (and save resources on the server side.)

lesserwhirls · 2021-03-24T19:12:26Z

I think at the very least we can make it configurable. As you mention, there is a cost based on the size of the backing datasets, but the cost will vary based on service type as well. For example, the HTTPServer service (/thredds/fileServer/*) will only be accessing a single file, and returning the size of that file for HEAD requests is not too bad.

lesserwhirls · 2021-03-24T19:14:11Z

@ethanrd - what do you think?

cskarby · 2021-03-25T13:08:16Z

I think at the very least we can make it configurable. As you mention, there is a cost based on the size of the backing datasets, but the cost will vary based on service type as well. For example, the HTTPServer service (/thredds/fileServer/*) will only be accessing a single file, and returning the size of that file for HEAD requests is not too bad.

I agree, but file size should probably come from the filesystem rather than opening the file and do a complete stream from disk to memory to count the bytes.

lesserwhirls · 2021-03-25T16:14:54Z

Indeed. I would grab file size from the File object for the HTTPServer service. The option to turn off Content-Length for the other services for HEAD requests is what I would target, since you can really only know those sizes by processing the request first.

ethanrd · 2021-03-29T05:58:36Z

Hi @cskarby - Looks like all TDS services except HTTPServer use HttpHeadFilter (according to applicationContext.xml). For HTTPServer, both GET and HEAD requests are handled by the same method because of Spring MVC defaults rather than the filter. HTTPServer does the right thing, getting size and last modified from the File object and, if it is a HEAD request, finishes without reading any bytes.

We could look at moving in a similar direction for the other TDS services (and make inclusion of Content-Length configurable). I don’t think there’s an across the board change to make this switch. It would take some work (though maybe not a lot) on each service to tease HEAD and GET apart.

Are you seeing performance issues with HEAD requests on particular TDS services?

gaellafond · 2023-07-03T03:52:15Z

I think at the very least we can make it configurable. As you mention, there is a cost based on the size of the backing datasets, but the cost will vary based on service type as well. For example, the HTTPServer service (/thredds/fileServer/*) will only be accessing a single file, and returning the size of that file for HEAD requests is not too bad.

Keep in mind that THREDDS now supports cloud file hosting, such as CDMS3 and CDMRemote. Calculating the file size is not always as simple as asking the file system.

lesserwhirls transferred this issue from Unidata/thredds Mar 24, 2021

lesserwhirls self-assigned this Mar 24, 2021

lesserwhirls added the enhancement New feature or request label Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete content retrieval on HTTP HEAD requests puts an unnecessary burden on tds #124

Complete content retrieval on HTTP HEAD requests puts an unnecessary burden on tds #124

cskarby commented Mar 24, 2021 •

edited by lesserwhirls

Loading

lesserwhirls commented Mar 24, 2021 •

edited

Loading

lesserwhirls commented Mar 24, 2021

cskarby commented Mar 25, 2021

lesserwhirls commented Mar 25, 2021

ethanrd commented Mar 29, 2021

gaellafond commented Jul 3, 2023

Complete content retrieval on HTTP HEAD requests puts an unnecessary burden on tds #124

Complete content retrieval on HTTP HEAD requests puts an unnecessary burden on tds #124

Comments

cskarby commented Mar 24, 2021 • edited by lesserwhirls Loading

lesserwhirls commented Mar 24, 2021 • edited Loading

lesserwhirls commented Mar 24, 2021

cskarby commented Mar 25, 2021

lesserwhirls commented Mar 25, 2021

ethanrd commented Mar 29, 2021

gaellafond commented Jul 3, 2023

cskarby commented Mar 24, 2021 •

edited by lesserwhirls

Loading

lesserwhirls commented Mar 24, 2021 •

edited

Loading