Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirects (design doc): improving existing functionality #10825

Merged
merged 10 commits into from
Oct 31, 2023
338 changes: 338 additions & 0 deletions docs/dev/design/redirects.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,338 @@
Improving redirects
===================

Redirects are a core feature of Read the Docs,
they allow users to keep old URLs working when they rename or move a page.

The current implementation lacks some features and has some undefined/undocumented behaviors.

.. contents::
:local:
:depth: 3

Goals
-----

- Improve the user experience when creating redirects.
- Improve the current implementation without big breaking changes.

Non-goals
---------

- Replicate every feature of other services without
having a clear use case for them.
- Improve the performance of redirects.
This can be discussed in an issue or pull request.
stsewd marked this conversation as resolved.
Show resolved Hide resolved
Performance should be considered when implementing new improvements.
- Allow importing redirects.
We should push users to use our API instead.
- Allow specifying redirects in the RTD config file.
We have had several discussions around this,
but we haven't reached a consensus.

Current implementation
----------------------

We have five types of redirects:

Prefix redirect:
Allows to redirect all the URLs that start with a prefix to a new URL
using the default version and language of the project.
For example: a prefix redirect with the value ``/prefix/``
will redirect ``/prefix/foo/bar`` to ``/en/latest/foo/bar``.

They are basically the same as an exact redirect with a wildcard at the end.
They are a shortcut for a redirect like:

From:
``/prefix/$rest``
To:
``/en/latest/``

Or maybe we could use a prefix redirect to replace the exact redirect with a wildcard?
stsewd marked this conversation as resolved.
Show resolved Hide resolved

Page redirect:
Allows to redirect a single page to a new URL using the current version and language.
For example: a page redirect with the value ``/old/page.html``
will redirect ``/en/latest/old/page.html`` to ``/en/latest/new/page.html``.

Cross domain redirects are not allowed in page redirects.
They apply to all versions,
if you want it to apply only to a specific version you can use an exact redirect.

A whole directory can't be redirected with a page redirect,
an exact redirect with a wildcard at the end needs to be used instead.

A page redirect on a single version project is the same as an exact redirect.

Exact redirect:
Allows to redirect an exact URL to a new URL,
it allows a wildcard at the end to redirect.
For example: an exact redirect with the value ``/en/latest/page.html``
will redirect ``/en/latest/page.html`` to the new URL.

If an exact redirect with the value ``/en/latest/dir/$rest``
is created, it will redirect all paths that start with ``/en/latest/dir/``,
the rest of the path will be added to the new URL automatically.

- Cross domain redirects are allowed in exact redirects.
- They apply to all versions.
- A wildcard is allowed at the end of the URL.
- If a wildcard is used, the rest of the path will be added to the new URL automatically.

Sphinx HTMLDir to HTML:
Allows to redirect clean-URLs to HTML URLs.
Useful in case a project changed the style of their URLs.

They apply to all projects, not just Sphinx projects.

Sphinx HTML to HTMLDir:
Allows to redirect HTML URLs to clean-URLs.
Useful in case a project changed the style of their URLs.

They apply to all projects, not just Sphinx projects.

How other services implement redirects
--------------------------------------

- Gitbook implementation is very basic,
they only allow page redirects.

https://docs.gitbook.com/integrations/git-sync/content-configuration#redirects

- Cloudflare pages allow to capture placeholders and one wildcard (in any part of the URL).
They also allow you to set the status code of the redirect,
and redirects can be specific in a ``_redirects`` file.

https://developers.cloudflare.com/pages/platform/redirects/

They have a limit of 2100 redirects.
In case of multiple matches, the topmost redirect will be used.

- Netlify allows to capture placeholders and a wildcard (only allowed at the end).
They also allow you to set the status code of the redirect,
and redirects can be specific in a ``_redirects`` file.

- Forced redirects
- Match query arguments
- Match by country/language and cookies
- Per-domain and protocol redirects
- In case of multiple matches, the topmost redirect will be used.

https://docs.netlify.com/routing/redirects/

Improvements
------------

General improvements
~~~~~~~~~~~~~~~~~~~~

The following improvements will be applied to all types of redirects.

- Allow choosing the status code of the redirect.
We already have a field for this, but it's not exposed to users.
- Allow to explicitly define the order of redirects.
stsewd marked this conversation as resolved.
Show resolved Hide resolved
This will be similar to the automation rules feature,
where users can reorder the rules so the most specific ones are first.
We currently rely on the implicit order of the redirects (updated_at).
- Allow to disable redirects.
It's useful when testing redirects, or when debugging a problem.
Instead of having to re-create the redirect,
we can just disable it and re-enable it later.
- Allow to add a short description.
It's useful to document why the redirect was created.

Don't run redirects on domains from pull request previews
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We currently run redirects on domains from pull request previews,
this is a problem when moving a whole project to a new domain.
stsewd marked this conversation as resolved.
Show resolved Hide resolved

We don't the need to run redirects on external domains, they
should be treated as temporary domains.

Normalize paths with trailing slashes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Currently, if users want to redirect a path with a trailing slash and without it,
they need to create two separate redirects (``/page/`` and ``/page``).

We can simplify this by normalizing the path before matching it.

For example:

From:
``/page/``
To:
``/new/page``

The from path will be normalized to ``/page``,
and the filename to match will also be normalized before matching it.
stsewd marked this conversation as resolved.
Show resolved Hide resolved
This is similar to what Netlify does:
https://docs.netlify.com/routing/redirects/redirect-options/#trailing-slash.

Page and exact redirects without a wildcard at the end will be normalized,
all other redirects need to be matched as is.

This makes it impossible to match a path with a trailing slash.

Explicit ``$rest`` placeholder
stsewd marked this conversation as resolved.
Show resolved Hide resolved
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Explicitly place the ``$rest`` placeholder in the target URL,
instead of adding it automatically.

Some times users want to redirect to a different path,
we have been adding a query parameter in the target URL to
prevent the old path from being added in the final path.
For example ``/new/path/?_=``.

Instead of adding the path automatically,
users have to add the ``$rest`` placeholder in the target URL.
For example:

From:
``/old/path/$rest``
To:
``/new/path/$rest``

From:
``/old/path/$rest``
To:
``/new/path/?page=$rest&foo=bar``
humitos marked this conversation as resolved.
Show resolved Hide resolved

Improving page redirects
stsewd marked this conversation as resolved.
Show resolved Hide resolved
~~~~~~~~~~~~~~~~~~~~~~~~

- Allow to redirect to external domains.
This can be useful to apply a redirect of a well known path
in all versions to another domain.

For example, ``/security/`` to a their security policy page in another domain.

This new feature isn't strictly needed,
but it will be useful to simplify the explanation of the feature
(one less restriction to explain).

- Allow a wildcard at the end of the from path.
This will allow users to migrate a whole directory to a new path
without having to create an exact redirect for each version.

Similar to exact redirects, users need to add the ``$rest`` placeholder explicitly.
This means that that page redirects are the same as exact redirects,
with the only difference that they apply to all versions.

Improving Sphinx redirects
~~~~~~~~~~~~~~~~~~~~~~~~~~

These redirects are useful, but we should rename them to something more general,
since they apply to all types of projects, not just Sphinx projects.

Proposed names:

- HTML URL to clean URL redirect (``file.html`` to ``file/``)
- Clean URL to HTML URL redirect (``file/`` to ``file.html``)
humitos marked this conversation as resolved.
Show resolved Hide resolved

Other ideas to improve redirects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Run forced redirects before built-in redirects.
We currently run built-in redirects before forced redirects,
this is a problem when moving a whole project to a new domain.
For example, a forced redirect like ``/$rest``,
won't work for the root URL of the project,
since ``/`` will first redirect to ``/en/latest/``.

But shouldn't be a real problem, since users will still need to
handle the ``/en/latest/file/`` paths.

- Run redirects on the edge.
Cloudflare allow us to create redirects on the edge,
but they have some limitations around the number of
redirect rules that can be created.

And they will be useful for forced exact redirects only,
since we can't match a redirect based on the response of the origin server.
stsewd marked this conversation as resolved.
Show resolved Hide resolved

- Merge prefix redirects with exact redirects.
Prefix redirects are the same as exact redirects with a wildcard at the end.

- Merge all redirects into a single type.
This may simplify the implementation,
but it will make it harder to explain the feature to users.
And to replace some redirects we need to implement some new features.

- Placeholders.
I haven't seen users requesting this feature.
We can consider adding it in the future.
Maybe we can expose the current language and version as placeholders.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea, and perhaps removes the need for Page redirects, since it's the same as /$lang/$version/$path as an exact redirect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be also useful for custom URLconf in the future when we support changing the position of them, like:

From: /:lang/:version/:path
To: /:lang/prefix/:version/:path

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we will be allowing to change the positions of components, we already discussed why this introduces a lot of complexity and doesn't solve user requirements (they only need to add a prefix, which we already suport).


- Replace ``$rest`` with ``*`` in the from_url.
This will be more consistent with other services,
but it will require users to re-learn the feature.
stsewd marked this conversation as resolved.
Show resolved Hide resolved

- Per-protocol redirects.
We should push users to always use HTTPS.

- Allow a prefix wildcard.
We currently only allow a suffix wildcard,
adding support for a prefix wildcard should be easy.
But do users need this feature?

- Per-domain redirects.
The main problem that originated this request was that we were applying redirects on external domains,
if we stop doing that, there is no need for this feature.
We can also try to improve how our built-in redirects work
(specially our canonical domain redirect).

Allow matching query arguments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We can do this in two ways:

- At the DB level with some restrictions.
If done at the DB level,
we would need to have a different field
with just the path, and other with the query arguments normalized and sorted.

For example, if we have a redirect with the value ``/foo?blue=1&yellow=2&red=3``,
if would be normalized in the DB as ``/foo`` and ``blue=1&red=3&yellow=2``.
This implies that the URL to be matched must have the exact same query arguments,
it can't have more or less.
Comment on lines +357 to +360
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be solved with a JSONField on the db to save the query arguments. In that case, the order and number of the arguments won't matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But matching can't be done with one query even using a JSONField either, matching will need to happen on the Python side.

Copy link
Member

@humitos humitos Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand your reply.

I'm saying that with a JSONField that stores the query arguments you can perform the match of the query arguments at DB level. Once you get those, you can perform the path/URL matching with Python.

This will reduce the complexity of handling all the matching on Python.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the current URl is ?one&two&three, we don't know which of those arguments we need to match in the rules. A rule can be one, or be two, or one&three.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is solvable by "matching the query arguments from the rule into the DB field" -- no the other way around as you are saying. Example:

.filter(queryargs__haskeys=rule.queryargs.keys())

https://docs.djangoproject.com/en/4.2/ref/contrib/postgres/fields/#has-keys

or we can use contained_by if we want to match specific key, value pairs:

.filter(queryargs__contained_by=rule.queryargs)

https://docs.djangoproject.com/en/4.2/ref/contrib/postgres/fields/#std-fieldlookup-hstorefield.contained_by

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, didn't know about those queries, I have updated the doc to mention this approach.


I believe the implementation described here is the same being used by Netlify,
since they have that same restriction.

If the URL contains other parameters in addition to or instead of id, the request doesn't match that rule.

https://docs.netlify.com/routing/redirects/redirect-options/#query-parameters

- At the Python level.
If done at the DB level,
we would need to have a different field
with just the path, and other with query arguments.

The matching of the path would be done at the DB level,
and the matching of the query arguments would be done at the Python level.
Here we can be more flexible, allowing any query arguments in the matched URL.

We had some performance problems in the past,
but I believe it was mainly due to the use of regex instead of using string operations.
And matching the path is still done at the DB level.
We could limit the number of redirects that can be created with query arguments,
or the number of redirects in general.

We hava had only one user requesting this feature,
so this is not a priority.

Migration
---------

Most of the proposed improvements are backwards compatible,
and just need a data migration to normalize existing redirects.

For the exception of adding the ``$rest`` placeholder in the target URL explicitly,
that needs users to re-learn how this feature works, i.e, they may be expecting
to have the path added automatically in the target URL.

We can create a small blog post explaining the changes.