Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Update redirects #56511

Closed
wants to merge 5 commits into from

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Dec 14, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Ref: #55626 (comment)

Opening this up as a draft. I've generated the new paths based on what was in the redirects.csv, but I'd like to find some way to test this.

Script for testing redirects
from pathlib import Path
import os

# Download each of the versions of docs listed below as zip files, unzip into a
# directory with subdirectories 1.1, 1.2, etc.
base_old = Path(f"/home/richard/Downloads/docs/")
base_pr = Path("/home/richard/dev/pandas/doc/build/html")

# We'll make sure redirects don't exist for any methods that have been removed
removed_methods = [".pad.", ".mad.", ".backfill.", ".tshift.", ".sem."]

def strip_str(haystack, needle, before):
    idx = haystack.find(needle)
    assert idx >= 0
    if before:
        result = haystack[idx + len(needle):]
    else:
        result = haystack[:idx]
    return result

for version in ["1.0", "1.1", "1.2", "1.3", "1.4", "1.5", "2.0", "2.1", "2.2"]:
    base_old_api = base_old / version / "reference" / "api"
    base_pr_api = base_pr / "reference" / "api"

    print('Version:', version)
    for e in base_old_api.rglob("*.html"):
        relative_path = e.relative_to(base_old_api)
        
        if not str(relative_path).startswith("pandas.core"):
            # Only looking at pandas.core redirects
            continue
        
        target_exists = os.path.exists(base_pr_api / relative_path)
        
        if any(e in str(relative_path) for e in removed_methods):
            if target_exists:
                print(f'  {relative_path}: Removed method has redirect')
            continue
            
        if not target_exists:
            print(f'  {relative_path}: Does not exist')
            continue

        contents = open(base_pr_api / relative_path).read()
        
        if "The page has been moved to" not in contents:
            # There is no redirect
            continue

        # Get the redirect path
        contents = strip_str(contents, "0;URL=", before=True)
        path = Path(strip_str(contents, '"/>\n', before=False))
        
        if not os.path.exists(base_pr_api / path):
            print(f'  {relative_path}: Redirect does not exist')
            continue

        contents = open(base_pr_api / path).read()
        if "The page has been moved to" in contents:
            print(f'  {relative_path}: Redirect is another redirect')
            
    print()
    print('-' * 20)
    print()

@rhshadrach
Copy link
Member Author

@datapythonista - to test the redirects, I'm planning to write a crawler that gets the API urls from previous versions of pandas and tries them out on this PR. I just want to make sure something that tests the redirects doesn't exist so I'm not reinventing the wheel.

@rhshadrach
Copy link
Member Author

/preview

Copy link
Contributor

No preview found for PR #56511. Did the docs build complete?

@datapythonista
Copy link
Member

/preview

Copy link
Contributor

Website preview of this PR available at: https://pandas.pydata.org/preview/56511/

@datapythonista
Copy link
Member

@datapythonista - to test the redirects, I'm planning to write a crawler that gets the API urls from previous versions of pandas and tries them out on this PR. I just want to make sure something that tests the redirects doesn't exist so I'm not reinventing the wheel.

I don't think we implemented any automated testing for the redirects when we added them (this is the commit: 6a745d8). And I'm not aware of any test added later either.

Your idea sounds good. I guess we could also analyze the logs of our server, and see for which pages we return a 404 (probably not in an automated way, but to make sure we don't have anything broken as of today). I guess there will be a lot of noise from bots trying paths if we do that, but we can have a look if you want.

@rhshadrach
Copy link
Member Author

Thanks @datapythonista - I got a script for checking the redirects added to the OP. I'll post results here when this is ready.

@rhshadrach
Copy link
Member Author

I reworked the script in the OP to check the redirects to work just using the filesystem. It checks for

  • URLs in older versions that don't resolve in the PR docs
  • Redirects in the PR docs that go to URLs that don't exist
  • Redirects in the PR docs that redirect again
  • Removed methods that still have redirects

Right now I'm just limiting the output to pandas.core - the script in the OP doesn't flag any issues. There are other redirect issues that I plan to tackle in a separate PR.

@rhshadrach rhshadrach requested review from datapythonista and removed request for datapythonista December 17, 2023 14:44
@rhshadrach
Copy link
Member Author

@datapythonista - With #55632, I think it's likely we'll be reverting #55626 and this will be closed in that case. If that does happen, I plan on putting up a PR fixing the redirects separately.

@rhshadrach rhshadrach closed this Dec 20, 2023
@rhshadrach rhshadrach deleted the api_typing_redirects branch February 14, 2024 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants