Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching isn't working #67

Open
7 tasks
curiousdannii opened this issue Dec 3, 2024 · 5 comments · May be fixed by #68
Open
7 tasks

Caching isn't working #67

curiousdannii opened this issue Dec 3, 2024 · 5 comments · May be fixed by #68
Labels
bug Something isn't working

Comments

@curiousdannii
Copy link
Member

curiousdannii commented Dec 3, 2024

Desired behaviour: in general, our caches (Cloudflare, nginx) should cache things briefly (1 day? 6 hours?), while the user's browser should cache things longer (1 week? month?). (We can set different cache times by using both max-age and s-maxage.) This reduces the time it takes for archive updates to reach the user (#64).

Specifics:

  • Should be cached in Cloudflare briefly and in the browser for a ~week
    • Index pages
    • Regular files on main domain
    • Unsafe files on subdomains
    • Error pages (except maybe for zip file not found errors? Should they be cached shorter?)
    • Redirects of regular files from subdomain to main domain (shouldn't be forever just in case we have to handle more Unity games that need to be all the same domain, etc)
  • Should be cached forever
    • Redirects to unsafe HTML files
    • Files with ?lastmod

We could consider doing more ?lastmod stuff, such as automatically adding it to the links in index pages. But it's not really needed, and it's a detail I think would be better to hide from the user. (Using it for HTML game redirects is more hidden.)

@curiousdannii curiousdannii added the bug Something isn't working label Dec 3, 2024
@dfabulich
Copy link
Contributor

I mentioned on Slack, I don't repro "caching not working." There, you clarified that:

Nothing is being cached either by nginx or the browser. If-Modified-Since also not working. Cloudflare cache is still working.

But that's not what I see.

When I go to https://2ckvcbhjia.unbox.ifarchive.org/2ckvcbhjia/index.html and view it in Chrome Dev Tools, unchecking “Disable cache”, the index.html file is set to max-age=0, as expected, then it makes requests for various CSS and JS files on the subdomain, e.g. https://2ckvcbhjia.unbox.ifarchive.org/2ckvcbhjia/interpreter/style.css and those are 302 redirects to https://unbox.ifarchive.org/2ckvcbhjia/interpreter/style.css?lastmod=1729135348000 which are max-age=604800.

When I refresh the page, I find that we’re requesting index.html with If-Modified-Since, and that nginx is responding with 200 OK with X-Cache: MISS. That does seem bad, because we would have preferred a 304 Not Modified response.

But the subresources are being cached by the browser.

image

As for your bulleted list, I don't think I agree with your list. Remembering that there are three kinds of caching (browser skips the request, browser revalidates stale content and gets 304 Not Modified, proxy caching), here's what I think we want:

  • max-age=0, return 304 Not Modified on refresh
    • Index pages
    • Regular files on main domain
  • max-age=0, return error code e.g. 404
    • Error pages
  • max-age=0 302 Found redirects
    • Redirects to unsafe HTML files (where the destination has a ?lastmod parameter)
    • Redirects of regular files from subdomain to main domain
  • max-age=604800
    • Files with ?lastmod

We could consider doing more ?lastmod stuff, such as automatically adding it to the links in index pages.

That's unwise, because then users will copy and paste URLs like index.html?lastmod=123 parameters into IFDB. If the HTML changes, unbox may begin serving index.html?lastmod=456 but IFDB will still be linking to index.html?lastmod=123 and so users won't get the new version until someone updates the link.

@curiousdannii
Copy link
Member Author

Right, it's the nginx cache that isn't working. The HTML files, the 302 redirects, and the resources of https://2k788xeots.unbox.ifarchive.org/2k788xeots/www/index.html should all be cached, but aren't right now. Many HTML files are small, but some, for example https://0cvfuriysy.unbox.ifarchive.org/0cvfuriysy/Glitch%20Perfect.html are over 100MB! They definitely need to be cached by nginx. The redirects are less important, but might as well be.

max-age=0 essentially means we're demanding If-Modified-Since checks for every refresh. I don't think we should do that. An "unbustable cache" to me means that if you Ctrl+F5 it won't get the new version, because the intermediate cache only has a stale version. It's very rare for Archive files to be updated, so I think a good approach is to tell browsers to cache for a week/month, but tell the intermediate caches to only cache for 6 hours up to a day. Or even just 1 hour, to just prevent the server being hammered by heavy traffic.

I didn't mention If-Modified-Since and 304 responses, but they need to work for everything as well, of course.

@curiousdannii
Copy link
Member Author

(Btw, I didn't know you could show that CF-Cache-Status header in dev tools, that's cool. I turned on X-Cache as well to see the nginx cache status.

I'd suggest you turn on the Domain column though, that would make clearer what's happening with https://2ckvcbhjia.unbox.ifarchive.org/2ckvcbhjia/index.html)

dfabulich added a commit to dfabulich/ifarchive-unbox that referenced this issue Dec 4, 2024
@dfabulich dfabulich linked a pull request Dec 4, 2024 that will close this issue
dfabulich added a commit to dfabulich/ifarchive-unbox that referenced this issue Dec 4, 2024
@dfabulich
Copy link
Contributor

max-age=0 essentially means we're demanding If-Modified-Since checks for every refresh. I don't think we should do that. An "unbustable cache" to me means that if you Ctrl+F5 it won't get the new version, because the intermediate cache only has a stale version. It's very rare for Archive files to be updated, so I think a good approach is to tell browsers to cache for a week/month, but tell the intermediate caches to only cache for 6 hours up to a day. Or even just 1 hour, to just prevent the server being hammered by heavy traffic.

Using max-age=0 for "unversioned" URLs (i.e. URLs without ?lastmod or something) is a widely held best practice, as long as you can generate a fast, cheap 304 Not Modified response to If-Modified-Since checks ("conditional requests").

For zip index pages and zipped HTML files, the unbox app server just has to read the details out of the zip cache, which is really fast, and a 302/304 response is ~300 bytes.

I'm absolutely certain the server won't be hammered, because the server isn't hammered right now, even though nginx caching is totally busted!

For what it's worth, we use max-age=0 on all HTML pages on IFDB and we're doing just fine. We don't even support conditional requests for our HTML; we just return 200 OK. (We do use ?lastmod or some equivalent on CSS, JS, and cover art, and we do support conditional requests for them, returning 304 Not Modified on revalidations.)

@curiousdannii
Copy link
Member Author

I guess it should be fine; especially as the file details are all held in RAM, so it's very quick to return a 304.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants