Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete users that haven't logged in for too long #647

Merged
merged 57 commits into from
Aug 13, 2024
Merged

Conversation

Splines
Copy link
Member

@Splines Splines commented May 31, 2024

Fixes #410.

Description

  • The previous user cleaner logged in to the MaMpf IMAP server and looked for hashes in mails it had sent before to users to make sure they didn't bounce. For bounces, the respective users were deleted immediately.
  • This behavior was deactivated as "unwanted" in Disable the user cleaner #599. It's too rigorous. And maybe a mail server happens to be down that evening and whoop, that user is deleted. We don't want that.
  • Since Track last sign in date #553, we track the last sign in date of users. This puts us in a new position: we can now use the last sign in date to delete users that haven't signed in for more than 6 months. We send deletion warning mails 40days, 14days, 7days and 2days before the actual deletion happens. The deletion of the user itself is irreversible.
  • For the complete flow, please see User cleaner does not detect all bounces correctly #410 (comment) or rather the docstrings for the UserCleaner class. Please complain if they don't explain some concepts that are actually implemented!

Changes

  • Implement the flow in the UserCleaner class (with unit tests).
  • Previous code in the UserCleaner is entirely deleted.
  • Schema changes: Remove old ghost_hash for users. Add new deletion_date field for users.
  • Don't miss out on the new initializer config/initializers/after_sign_in.rb

No mail bounce detection anymore

What about mail bounce detection? Sadly enough, detecting mail bounces is a non-trivial problem. Parsing the body of mails is not robust enough as texts like "Mail could not be sent" are not standardized. There exists a good strategy called Variable envelope return path (VERP) which sounds very promising. However, we might ask ourselves if it's worth the effort to implement this in light of the benefits we get.

Yes, a mail server of a user might be down for some time. But how likely is it that a user hasn't logged in to MaMpf for at least 6 months, then their mail server is down when we send the warning deletion mail 40 days, 14 days, 7 days AND 2 days before actual deletion? Considering how unlikely this event is, I think we can invest our time better instead of implementing a bounce detection algorithm that has to log in to our own IMAP server to retrieve all emails. It's also a lot less pain to maintain the UserCleaner without such a brittle component.

TODO

  • Check what happens to comments that a generic user posted. We should have a "Deleted User" anonymous user that is assigned to those comments (or at least have that written in the comment title) -> comments are currently deleted, I will tackle this in a separate PR
  • Check what happens to media that a generic user posted in a seminar talk. We probably want to completely delete it. -> while the user is deleted, the seminar still exists (with an empty user) and the media is preserved. We may want to change this behavior in a separate PR.
  • Add maximum limit of deletions per run.

Reviewers

The best place to start the review is probably the specs file for the UserCleaner.

Caution

The code presented here might have very destructive consequences if something goes wrong. In the worst case scenario, it will delete active users that we didn't intend to delete. Unit tests should cover these cases but please be extremely vigilant with this PR and take extra steps and time to make sure the unit tests cover all edge cases and actually reflect semantically-wise what we want to happen.

Tip

If tests don't work locally, you might have to run the migrations in the test docker container first.
That is: cd ./docker/test, docker compose up -d, docker compose exec mampf bash, rake db:migrate.

Tip

In the Files changed tab on GitHub, you can mark files as done to get a better overview of what you've already reviewed. Every single line of code here should be looked through.

Before merge

  • Plan when this code will come to the main branch, so that everyone (including sysadmins) are on standby in case something goes wrong.
  • Check that our "import backups" strategy still works flawlessly and can be fully executed within 1 hour.
  • Set MAX_DELETIONS_PER_RUN env variable in production/dev/experimental
  • Set PRODUCTION_NAME=${PROJECTNAME} for docker compose up -d in redeploy scripts for production/dev/experimental

@Splines Splines self-assigned this May 31, 2024
@Splines Splines changed the title Add new use cleaner Add new user cleaner May 31, 2024
Copy link

codecov bot commented May 31, 2024

Codecov Report

Attention: Patch coverage is 91.83673% with 4 lines in your changes missing coverage. Please review.

Please upload report for BASE (dev@013cca2). Learn more about missing BASE report.

Files Patch % Lines
app/models/user_cleaner.rb 87.50% 4 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##             dev     #647   +/-   ##
======================================
  Coverage       ?   53.80%           
======================================
  Files          ?      157           
  Lines          ?     6644           
  Branches       ?        0           
======================================
  Hits           ?     3575           
  Misses         ?     3069           
  Partials       ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This is to ensure that local changes in the `app` folder are reflected
in the Docker container such that newly written tests can be run through
without having to rebuild the whole Docker image.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants