Add automatic checking for profanity #45

Dunedan · 2024-09-26T09:47:48Z

This adds functionality to automatically check for profanity in text messages written in any of the XMPP MUC rooms monitored by the moderation bot.

The terms being considered profanity can be configured using the database and are language specific. They have to be stored in their lemmatized form. English terms will always be checked, in addition, if a supported language other than English is detected, the terms configured for that language are being checked as well. Supported languages for now are English, French, German, Portuguese, Russian, Spanish and Turkish.

The first two times in a sliding window of three months a user uses profanity they'll receive a warning. Starting from the third time, the user will get muted. At first users will be muted for five minutes, with an exponentially increasing duration up to one week for each continued use of profanity afterwards.

To enable this functionality the --enable-profanity-monitoring command line option has to be provided.

This change requires a database migration for existing databases.
The following SQL-commands can be used for that:

DROP TABLE profanity_whitelist;

CREATE TABLE profanity_terms (
  term VARCHAR(255) NOT NULL,
  language VARCHAR(2) NOT NULL,
  PRIMARY KEY (term, language)
);

INSERT INTO
  profanity_terms (term, language)
SELECT
  word AS term,
  '["en"]'
FROM
  profanity_blacklist;

DROP TABLE profanity_blacklist;

ALTER TABLE profanity_incidents
RENAME TO profanity_incidents_old;

CREATE TABLE profanity_incidents (
  id INTEGER NOT NULL,
  timestamp DATETIME NOT NULL,
  player VARCHAR(255) NOT NULL,
  room VARCHAR(255) NOT NULL,
  offending_content TEXT NOT NULL,
  detected_languages JSON NOT NULL,
  matched_terms JSON NOT NULL,
  PRIMARY KEY (id)
);

INSERT INTO
  profanity_incidents
SELECT
  id,
  timestamp,
  player,
  '[email protected]',
  offending_content,
  '[]',
  '[]'
FROM
  profanity_incidents_old
WHERE
  deleted != '1';

DROP TABLE profanity_incidents_old;

xpartamupp/modbot.py

rendello · 2024-09-27T18:49:01Z

Two false positives I found in testing:

fr J'étais en retard avec ma cavalerie

es Eso puede retardar los romanos

rendello · 2024-09-27T19:33:29Z

If you name your player an insult you can get the moderation bot to kick the ratings bot. This was fun to test 😆

rossenburgg · 2024-09-27T19:49:16Z

If you name your player an insult you can get the moderation bot to kick the ratings bot. This was fun to test 😆

Perhaps we could simply exclude filtering for the specific JID associated with the other bot ?@Dunedan

Dunedan · 2024-09-30T11:01:22Z

Thanks for reporting these issues.

Two false positives I found in testing:

fr J'étais en retard avec ma cavalerie

es Eso puede retardar los romanos

While this looks like the reason for these two false-positives might have been the same one, it's actually been two different reasons.

For the French sentence it was because the bot always checked the English profanity terms as well, in addition to the ones in the detected language. I changed that now, so it doesn't check English ones anymore if it detects at least one other language with 100% certainty. That won't fix all of such false-positives, but should produce much fewer of them.

The Spanish sentence was caused by a bug in the detection of profanity in phrases, which caused partial words to get matched.

If you name your player an insult you can get the moderation bot to kick the ratings bot.

I already had thought about this case when implementing the functionality and the intention was to not punish users for writing other users names, even if these names contain profanity. However, there was a bug in the implementation so it only checked the usernames against the lemmatized words written. That meant the moderation bot would detect EcheLOn writing "fuck" and not finding a player with the same name.

All of these issues should be fixed now, but I'd appreciate further testing.

rendello · 2024-10-01T18:02:20Z

Glad to help. Me and Norse_Herold had been talking around with profanity monitoring so I had a few test cases in mind.

This adds functionality to automatically check for profanity in text messages written in any of the XMPP MUC rooms monitored by the moderation bot. The terms being considered profanity can be configured using the database and are language specific. They have to be stored in their lemmatized form. If a supported language gets detected with an accuracy of 100% only terms for that language will be checked, otherwise English terms will be checked as well. Supported languages for now are English, French, German, Polish, Portuguese, Russian, Spanish and Turkish. For the first two times in a sliding window of three months a user uses profanity they'll receive a warning. Starting from the third time, the user will get muted. At first users will be muted for five minutes, with an exponentially increasing duration up to one week for each continued use of profanity afterwards. To enable this functionality the `--enable-profanity-monitoring` command line option has to be provided.

rossenburgg reviewed Sep 26, 2024

View reviewed changes

xpartamupp/modbot.py Show resolved Hide resolved

Dunedan mentioned this pull request Oct 1, 2024

Overzealous SECCOMP filters killing bots 0ad/lobby-infrastructure#32

Closed

Dunedan force-pushed the profanity-monitoring branch from 79f52f5 to 1ed99e3 Compare October 18, 2024 12:30

Dunedan merged commit a5d77d1 into 0ad:master Oct 18, 2024
3 checks passed

Dunedan deleted the profanity-monitoring branch October 18, 2024 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add automatic checking for profanity #45

Add automatic checking for profanity #45

Dunedan commented Sep 26, 2024

rendello commented Sep 27, 2024

rendello commented Sep 27, 2024

rossenburgg commented Sep 27, 2024

Dunedan commented Sep 30, 2024

rendello commented Oct 1, 2024

Add automatic checking for profanity #45

Add automatic checking for profanity #45

Conversation

Dunedan commented Sep 26, 2024

rendello commented Sep 27, 2024

rendello commented Sep 27, 2024

rossenburgg commented Sep 27, 2024

Dunedan commented Sep 30, 2024

rendello commented Oct 1, 2024