Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove querystring from generated canonical links #8261

Merged
merged 4 commits into from
Mar 21, 2024
Merged

Conversation

wpears
Copy link
Member

@wpears wpears commented Mar 21, 2024

This PR changes how canonical links are generated in base.html, specifically it strips the querystring. Across all the pages I audited (perhaps non-exhaustive) the querystring is simply used as a filter or as a way to track specific selections. The whole point of canonical links is to collate these views of the same page into one... canonical link... that google can then reference and give more link equity to, instead of diffusing it across N pages (where N is the number of pages in a filterable list or w/e).

Testing:

  • Pull
  • Load various pages
    • On pages without querystrings, the canonical link should be unchanged
    • On pages WITH querystrings, eg, this one, the canonical link should have the querystring stripped out.

@wpears wpears requested a review from willbarton March 21, 2024 17:35
@@ -65,7 +65,7 @@
<base target="_blank">
{% endif %}

<link rel="canonical" href="{{ request.build_absolute_uri() | lower }}">
<link rel="canonical" href="{{ request.build_absolute_uri().split('?')[0] | lower }}">
Copy link
Member

@willbarton willbarton Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a distinction here between build_absolute_uri() and something like {{ request.scheme }}://{{ request.get_host }}{{ request.path }}. It looks like probably not from the build_absolute_uri source.

My preference would to do less Python in templates than more, so either using the constituent properties of request or moving the string slicing into a template tag.

Copy link
Member Author

@wpears wpears Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5bfe0a1. I hard-coded https because that will be true everywhere that canonical links matter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my testing, I think the | lower is superfluous as well (as there is a redirect or lower upstream), but I didn't look too hard at it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely recall a need to explicitly lower-case some URLs for canonicalization purposes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, happy to leave it without a strong reason to remove

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use request.build_absolute_uri(request.path) to avoid the need to strip the querystring in such an ugly way? See docs.

@willbarton
Copy link
Member

Is there any reason to not to use the same value in the og:url a few lines down? IIRC it's supposed to be the canonical URL as well.

@wpears
Copy link
Member Author

wpears commented Mar 21, 2024

Is there any reason to not to use the same value in the og:url

Done in b94052. I also squeezed in andy's proposed method, which is prettier than tilde concatenation.

Copy link
Member

@willbarton willbarton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful

@wpears wpears enabled auto-merge March 21, 2024 19:23
@wpears wpears added this pull request to the merge queue Mar 21, 2024
Merged via the queue into main with commit f76e671 Mar 21, 2024
13 checks passed
@wpears wpears deleted the actual-canonical-links branch March 21, 2024 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants