basic spam detection #719

sgfost · 2024-05-11T00:26:29Z

uses two methods:

timing form load->submission time
using a honeypot field hidden from real users with css

logs actions to a SpamModeration model with a generic fk to content models. actions such as deactivating the user or allowing the post can then be taken via a WIP admin view

TODO:

test coverage
usable admin view with
- confirming/rejecting spam assessment
- filtering
- option to deactivate user on confirm (or default?)
- email reminder when X number of spam objects pile up, or after Y amt of time (?)
add more context to SpamContent detailing how/why we think its spam, flexible enough that it can also be used for detecting with NLP (think ahead to reconciling Feat: Spam Detection Feature #693)

resolves comses/planning#235

uses two methods: - timing form load->submission time - using a honeypot field hidden from real users with css logs to a SpamContent model with a generic fk to content models. action such as deactivating the user or allowing the post can then be taken via a WIP admin view

django/curator/views.py

allows for: - filtering by review status and content type - manual confirmation of spam (with option to deactivate submitter) - manual denial of spam also now using wagtail-modeladmin instead of deprecated wagtail.contrib.modeladmin (resolves comses/planning#170)

sgfost · 2024-05-16T04:11:47Z

spam content, and deleted events/jobs for that matter, still show up in the site-wide search. Need to find a good way to filter those from the general search but struggling to figure out how it currently does this for non-live codebases..

alee · 2024-05-16T21:22:25Z

spam content, and deleted events/jobs for that matter, still show up in the site-wide search. Need to find a good way to filter those from the general search but struggling to figure out how it currently does this for non-live codebases..

good catch, we should have a

@classmethod
get_indexed_objects(cls)

defined for Events, MemberProfiles, Jobs

sgfost · 2024-05-17T17:04:03Z

ah that's it, thanks

The bulk of this is done, just need to put up a few tests. I think it ought to work as a more flexible base for using the work in #693 as another detection method, but lmk if you have any thoughts or concerns when you get around to it

standard public() query = not spam and not deleted - make deleted (more like archived) events/jobs viewable to those with access with an alert

- refactor some test [Model]Factory classes to inherit common functionality + add method for returning data to make a create/update request

BREAKING CHANGES(schema): existing schema will break if this branch has already been deployed, manually run ``` ./manage.py migrate core 0020 ``` before running git pull - rename SpamContent -> SpamModeration and SpamModeration.Status to SPAM | NOT_SPAM | UNREVIEWED for clarity - add ModeratedContent mixin class for access to SpamModeration things - squash / regenerate migrations TODO: consider doing the same for QuerySets that should respect a spam_moderation field

Co-authored-by: Scott Foster <[email protected]>

set load time with useForm.setValuesWithLoadTime instead of useFormTimer just for setting a timestamp

- correct alert_if_spam macro reference - move comment to the right place

alee

LGTM thanks @sgfost ! I think I fixed a few minor issues with the codebase release edit form still including an outdated reference to the spam alert macro, will wait and see if things still look OK on staging before merging

alee · 2024-05-20T22:29:58Z

django/core/jinja2/core/events/retrieve.jinja

@@ -21,6 +21,8 @@
 {% endblock ogp_tags %}

 {% block content %}
+    {{ alert_if_spam(is_marked_spam) }}
+    {{ alert_if_deleted(is_deleted, "event") }}


when if ever will alert_if_deleted be executed? the old behavior was to generate a 404 when trying to retrieve something that's been marked as deleted, guessing that's no longer the case?

Yeah, its now treated exactly the same as things marked spam, hidden from list view but accessible if you have a direct link to it. There wasn't much reasoning behind it besides simplifying queries

alee · 2024-05-21T23:15:51Z

django/core/models.py

should we consider attaching SpamModeration to MemberProfile as well?

alee · 2024-05-21T23:22:17Z

django/core/mixins.py

@@ -180,3 +184,112 @@ def list(self, request, *args, **kwargs):

        serializer = self.get_serializer(queryset, many=True)
        return Response(serializer.data)
+
+
+class SpamCatcherSerializerMixin(serializers.Serializer):


should this use class SpamCatcherSerializerMixin(metaclass=serializers.SerializerMetaclass) instead?

see

https://stackoverflow.com/questions/28747487/mixin-common-fields-between-serializers-in-django-rest-framework

and

encode/django-rest-framework#4482 (comment)

Probably, yeah. the mixin shouldn't be a serializer. I'll check this out

alee · 2024-05-21T23:23:04Z

django/core/serializers.py

@@ -310,7 +311,7 @@ def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)


-class EventSerializer(serializers.ModelSerializer):
+class EventSerializer(serializers.ModelSerializer, SpamCatcherSerializerMixin):


This appears to be working fine but for future-proofing should the SpamCatcherSerializerMixin be specified first so that the Python MRO invokes its stuff first before serializers.ModelSerializer?

Same for all the other Serializers, some of which have ModelSerializer before other mixins...

alee · 2024-05-22T00:12:10Z

django/core/views.py

+    def get_queryset(self):
+        # exclude spam from list view
+        if self.action == "list":
+            return self.queryset.public()


am I understanding this correctly to allow possible spam events to be accessible in the detail view? (same for Jobs?)

Yes, mostly to give an explanation of what happened should a real post get mistakenly flagged. Though it should probably only be accessible by admins and the submitter

alee · 2024-05-22T00:18:15Z

django/curator/views.py

 from django.views.decorators.http import require_POST
-from wagtail.contrib.modeladmin.helpers import AdminURLHelper
+from wagtail_modeladmin.helpers import AdminURLHelper


should eventually migrate to Snippets or alternative https://github.com/comses/planning/issues/248

django/library/jinja2/library/codebases/releases/edit.jinja

sgfost · 2024-05-22T02:18:41Z

Thx, I'll make a new issue with these needed changes as to not block merging if its in a 'good enough' state

sgfost added enhancement curator labels May 11, 2024

feat: add detection context to spam content

4a273c7

github-advanced-security bot found potential problems May 15, 2024

View reviewed changes

django/curator/views.py Fixed Show fixed Hide fixed

django/curator/views.py Fixed Show fixed Hide fixed

sgfost force-pushed the spam-catcher branch from 588cf60 to e8ddaa0 Compare May 15, 2024 23:58

sgfost force-pushed the spam-catcher branch from e8ddaa0 to 3ce321c Compare May 16, 2024 00:07

sgfost added 2 commits May 17, 2024 21:06

fix: do not include any spam/deleted content in search

78b171e

standard public() query = not spam and not deleted - make deleted (more like archived) events/jobs viewable to those with access with an alert

test: add coverage for spam detection

0172885

- refactor some test [Model]Factory classes to inherit common functionality + add method for returning data to make a create/update request

sgfost requested a review from alee May 20, 2024 21:38

alee changed the title ~~catching spam~~ basic spam detection May 20, 2024

alee and others added 3 commits May 20, 2024 15:22

fix: compute submit_time on the server side

3428d80

Co-authored-by: Scott Foster <[email protected]>

refactor: clean up unused submitTime generation on client

a2ce823

set load time with useForm.setValuesWithLoadTime instead of useFormTimer just for setting a timestamp

alee force-pushed the spam-catcher branch from 3a390b1 to a2ce823 Compare May 21, 2024 03:32

fix: prevent release editing if marked as spam

8d4fe55

- correct alert_if_spam macro reference - move comment to the right place

alee approved these changes May 22, 2024

View reviewed changes

alee merged commit 6fcc3a5 into comses:main May 22, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basic spam detection #719

basic spam detection #719

sgfost commented May 11, 2024 •

edited

Loading

sgfost commented May 16, 2024

alee commented May 16, 2024

sgfost commented May 17, 2024

alee left a comment

alee May 20, 2024 •

edited

Loading

sgfost May 22, 2024

alee May 21, 2024 •

edited

Loading

alee May 21, 2024 •

edited

Loading

sgfost May 22, 2024

alee May 21, 2024 •

edited

Loading

alee May 22, 2024

sgfost May 22, 2024

alee May 22, 2024

sgfost commented May 22, 2024

basic spam detection #719

basic spam detection #719

Conversation

sgfost commented May 11, 2024 • edited Loading

TODO:

sgfost commented May 16, 2024

alee commented May 16, 2024

sgfost commented May 17, 2024

alee left a comment

Choose a reason for hiding this comment

alee May 20, 2024 • edited Loading

Choose a reason for hiding this comment

sgfost May 22, 2024

Choose a reason for hiding this comment

alee May 21, 2024 • edited Loading

Choose a reason for hiding this comment

alee May 21, 2024 • edited Loading

Choose a reason for hiding this comment

sgfost May 22, 2024

Choose a reason for hiding this comment

alee May 21, 2024 • edited Loading

Choose a reason for hiding this comment

alee May 22, 2024

Choose a reason for hiding this comment

sgfost May 22, 2024

Choose a reason for hiding this comment

alee May 22, 2024

Choose a reason for hiding this comment

sgfost commented May 22, 2024

sgfost commented May 11, 2024 •

edited

Loading

alee May 20, 2024 •

edited

Loading

alee May 21, 2024 •

edited

Loading

alee May 21, 2024 •

edited

Loading

alee May 21, 2024 •

edited

Loading