Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-evaluate logic for deleting rows from ratings table #738

Open
audiodude opened this issue Jun 17, 2024 · 1 comment
Open

Re-evaluate logic for deleting rows from ratings table #738

audiodude opened this issue Jun 17, 2024 · 1 comment

Comments

@audiodude
Copy link
Member

In #737, a WikiProject selection with multiple projects winds up with an article list that contains dozens of deleted articles.

This seems due to the fact that articles which have been deleted from English Wikipedia are never deleted from the ratings db table. The algorithm goes like this:

  1. Find all the articles in the "... by quality" category for the project
  2. Compare to all of the articles in the ratings table for that project
  3. For any articles that are in the db (ratings table) but not the category:
    4. Check if their quality/importance is already set to NotAClass. If so, skip
    5. Check if they have been moved in 3 different ways.
    6. If so, set the move data for that log
    7. Regardless, set their quality or importance rating (or both) to NotAClass.

There is additional separate logic for deleting articles with this WHERE clause:

        WHERE r_project=%(r_project)s AND
              (r_quality IS NULL OR r_quality=%(not_a_class)s) AND
              (r_importance IS NULL OR r_importance=%(not_a_class)s)

So the bug is that articles in different namespaces like Category pages:

---------+---------+
| r_project | r_namespace | r_article  | r_quality      | r_quality_timestamp  | r_importance | r_importance_timestamp | r_score |
+-----------+-------------+------------+----------------+----------------------+--------------+------------------------+---------+
| Theatre   |          14 | 1725_plays | Category-Class | 2011-07-07T04:40:07Z | NA-Class     | 2011-07-07T04:40:07Z   |       0 |
| Years     |          14 | 1725_plays | Category-Class | 2016-12-31T09:22:04Z | NA-Class     | 2016-12-31T09:22:04Z   |       0 |
+-----------+-------------+------------+----------------+----------------------+--------------+------------------------+---------+

End up never being deleted!

My guess is that we could change the WHERE clause to include an OR r_namespace > 0 clause.

@audiodude
Copy link
Member Author

My evaluation above is incorrect, because of course NA-Class is not the same as NotA-Class. Of course!

So there's some other reason why 1725_plays, which is deleted, still appears in the ratings table.

The bot is running right now so I don't want to mess with debugging this just yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant