Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic spam detection #719

Merged
merged 9 commits into from
May 22, 2024
Merged

basic spam detection #719

merged 9 commits into from
May 22, 2024

Conversation

sgfost
Copy link
Contributor

@sgfost sgfost commented May 11, 2024

uses two methods:

  • timing form load->submission time
  • using a honeypot field hidden from real users with css

logs actions to a SpamModeration model with a generic fk to content models. actions such as deactivating the user or allowing the post can then be taken via a WIP admin view

TODO:

  • test coverage
  • usable admin view with
    • confirming/rejecting spam assessment
    • filtering
    • option to deactivate user on confirm (or default?)
    • email reminder when X number of spam objects pile up, or after Y amt of time (?)
  • add more context to SpamContent detailing how/why we think its spam, flexible enough that it can also be used for detecting with NLP (think ahead to reconciling Feat: Spam Detection Feature #693)

resolves comses/planning#235

uses two methods:
- timing form load->submission time
- using a honeypot field hidden from real users with css

logs to a SpamContent model with a generic fk to content models. action
such as deactivating the user or allowing the post can then be taken via a WIP admin view
django/curator/views.py Fixed Show fixed Hide fixed
django/curator/views.py Fixed Show fixed Hide fixed
allows for:
- filtering by review status and content type
- manual confirmation of spam (with option to deactivate submitter)
- manual denial of spam

also now using wagtail-modeladmin instead of deprecated
wagtail.contrib.modeladmin (resolves comses/planning#170)
@sgfost
Copy link
Contributor Author

sgfost commented May 16, 2024

spam content, and deleted events/jobs for that matter, still show up in the site-wide search. Need to find a good way to filter those from the general search but struggling to figure out how it currently does this for non-live codebases..

@alee
Copy link
Member

alee commented May 16, 2024

spam content, and deleted events/jobs for that matter, still show up in the site-wide search. Need to find a good way to filter those from the general search but struggling to figure out how it currently does this for non-live codebases..

good catch, we should have a

@classmethod
get_indexed_objects(cls)

defined for Events, MemberProfiles, Jobs

@sgfost
Copy link
Contributor Author

sgfost commented May 17, 2024

ah that's it, thanks

The bulk of this is done, just need to put up a few tests. I think it ought to work as a more flexible base for using the work in #693 as another detection method, but lmk if you have any thoughts or concerns when you get around to it

sgfost added 2 commits May 17, 2024 21:06
standard public() query = not spam and not deleted

- make deleted (more like archived) events/jobs viewable to those with
  access with an alert
- refactor some test [Model]Factory classes to inherit common
  functionality + add method for returning data to make a create/update
  request
@sgfost sgfost requested a review from alee May 20, 2024 21:38
@alee alee changed the title catching spam basic spam detection May 20, 2024
alee and others added 3 commits May 20, 2024 15:22
BREAKING CHANGES(schema): existing schema will break if this branch has
already been deployed, manually run

```
./manage.py migrate core 0020
```

before running git pull

- rename SpamContent -> SpamModeration and SpamModeration.Status to SPAM
  | NOT_SPAM | UNREVIEWED for clarity
- add ModeratedContent mixin class for access to SpamModeration things
- squash / regenerate migrations

TODO: consider doing the same for QuerySets that should respect a
spam_moderation field
set load time with useForm.setValuesWithLoadTime instead of useFormTimer
just for setting a timestamp
- correct alert_if_spam macro reference
- move comment to the right place
Copy link
Member

@alee alee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @sgfost ! I think I fixed a few minor issues with the codebase release edit form still including an outdated reference to the spam alert macro, will wait and see if things still look OK on staging before merging

@@ -21,6 +21,8 @@
{% endblock ogp_tags %}

{% block content %}
{{ alert_if_spam(is_marked_spam) }}
{{ alert_if_deleted(is_deleted, "event") }}
Copy link
Member

@alee alee May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when if ever will alert_if_deleted be executed? the old behavior was to generate a 404 when trying to retrieve something that's been marked as deleted, guessing that's no longer the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, its now treated exactly the same as things marked spam, hidden from list view but accessible if you have a direct link to it. There wasn't much reasoning behind it besides simplifying queries

Copy link
Member

@alee alee May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider attaching SpamModeration to MemberProfile as well?

@@ -180,3 +184,112 @@ def list(self, request, *args, **kwargs):

serializer = self.get_serializer(queryset, many=True)
return Response(serializer.data)


class SpamCatcherSerializerMixin(serializers.Serializer):
Copy link
Member

@alee alee May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this use class SpamCatcherSerializerMixin(metaclass=serializers.SerializerMetaclass) instead?

see

https://stackoverflow.com/questions/28747487/mixin-common-fields-between-serializers-in-django-rest-framework

and

encode/django-rest-framework#4482 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, yeah. the mixin shouldn't be a serializer. I'll check this out

@@ -310,7 +311,7 @@ def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)


class EventSerializer(serializers.ModelSerializer):
class EventSerializer(serializers.ModelSerializer, SpamCatcherSerializerMixin):
Copy link
Member

@alee alee May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be working fine but for future-proofing should the SpamCatcherSerializerMixin be specified first so that the Python MRO invokes its stuff first before serializers.ModelSerializer?

Same for all the other Serializers, some of which have ModelSerializer before other mixins...

def get_queryset(self):
# exclude spam from list view
if self.action == "list":
return self.queryset.public()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I understanding this correctly to allow possible spam events to be accessible in the detail view? (same for Jobs?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mostly to give an explanation of what happened should a real post get mistakenly flagged. Though it should probably only be accessible by admins and the submitter

from django.views.decorators.http import require_POST
from wagtail.contrib.modeladmin.helpers import AdminURLHelper
from wagtail_modeladmin.helpers import AdminURLHelper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should eventually migrate to Snippets or alternative https://github.com/comses/planning/issues/248

@sgfost
Copy link
Contributor Author

sgfost commented May 22, 2024

Thx, I'll make a new issue with these needed changes as to not block merging if its in a 'good enough' state

@alee alee merged commit 6fcc3a5 into comses:main May 22, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants