-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling redirects in published HTML pages #16
Comments
Thanks for opening the issue, sounds like a great idea to me!
Sure!
AFAIK yeah, but I don't think we've tested it out before. May be worth googling around and seeing if cloudfront is compatible with S3 website redirects.
The listing should reside in rust-lang/rust, not rust-central-station. Basically the listing will have to make its way to the published tarballs.
Up to you! |
I'd be very careful emitting "301 moved permanently". There is no upper limit to how long browsers and search engines remember such a redirect, which effectively means such a redirect, once in place, can never ever be changed again. You cannot make it point somewhere else, and you cannot turn it back to a normal (code 200) website. |
A 3xx response could be sent with Cache-Control: max-age=x headers that clients should respect. If these 301 redirects are only ever sent with a reasonable value in a header like this, it would remain possible to begin serving either old or new content from the same URL in the future. After testing this out with a S3 bucket, it appears that setting the object metadata for a redirect does send a 301 response, but it does not include any other headers such as Cache-Control that have been set on the same object. That's unfortunate.
Hmm. Whatever 3xx code would be most appropriate, I originally suggested 301 as it is the only type of redirect that S3 can send directly. If we wanted to use a different type of redirect, it would have to come at a layer in front of S3. The two motivations for adding redirects are search engine optimization and end user experience. As far as I understand, both temporary and permanent redirects should be favorably treated by most search engines today, and end users won't notice any difference unless they are stuck in a 301 trap. Semantically, the content has probably moved permanently, considering it has been in a new location for almost 3 years now. After some further investigation, it appears that the entire doc.rust-lang.org site is actually proxied through some other machine running nginx in front of the S3 bucket, rather than CloudFront like www.rust-lang.org is. So any redirects would also need to pass properly through that proxy. Considering this along with the issue I noted above about S3's 301 redirects not including Cache-Control headers, it seems like getting redirects working would require many changes beyond just those in the The search engine optimization goal can probably be handled just as well by adding |
A |
Even if the plan outlined above won't work, we should still address the issue of handling redirects better. |
Summary
I propose that we create a file (or files) listing redirects, then use the
--website-redirect
option of the AWS CLI tool within thepromote-release
tool to publish these as 301 Moved Permanently redirects to the appropriate locations.Background
There are now a significant number of links on the web that point to what are essentially "This page has moved" pages on https://doc.rust-lang.org.
Multiple issues have been raised regarding these pages, including rust-lang/rust#42632. Refer to that issue for specifics on some of the ways these redirect pages are creating problems. In short, both search engines and users clicking links are finding themselves on pages that don't need to exist.
Proposal
For pages that amount to no more than a "go here instead" message, we should consider serving 301 Moved Permanently responses when there is a definitive target.
For example, https://doc.rust-lang.org/tutorial.html should just redirect to the link it highlights.
A large collection of pages from the first edition of the book, like https://doc.rust-lang.org/book/enums.html, should just redirect to the new URL for the first edition, like https://doc.rust-lang.org/book/first-edition/enums.html. (Those pages also highlight the existence of a second edition, which can alternatively be highlighted from the actual result pages.)
Anything previously under a
/stable/
link that has now moved would be a good candidate as well.I assume there are other opportunities for redirects that I have not listed here.
Plan
From what I can tell, the
publish_docs
function here is sending all these pages to https://doc.rust-lang.org. I assume there is a CDN layer in front of that, as well.Several of the AWS CLI commands (like cp, here) have a
--website-redirect
option that can be used to attach metadata indicating that requests for an object should be served with a 301 Moved Permanently response.In order to call this command, we will need to have a list of individual pages that need to be redirected, and the new location for each.
Questions
Is using 301 redirects on these pages an appropriate solution?
Will a 301 redirect in S3 work properly with the CDN that is presumably in front of it?
Should this redirect listing reside in rust-lang/rust-central-station, or should there be one in each of the various projects that end up in rust-docs\share\doc\rust\html? (book, nomicon, reference, std, unstable-book)
What is a good format to use for a file listing redirects?
Ref: rust-lang/book#760
The text was updated successfully, but these errors were encountered: