Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic checking for profanity #45

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

Dunedan
Copy link
Collaborator

@Dunedan Dunedan commented Sep 26, 2024

This adds functionality to automatically check for profanity in text messages written in any of the XMPP MUC rooms monitored by the moderation bot.

The terms being considered profanity can be configured using the database and are language specific. They have to be stored in their lemmatized form. English terms will always be checked, in addition, if a supported language other than English is detected, the terms configured for that language are being checked as well. Supported languages for now are English, French, German, Portuguese, Russian, Spanish and Turkish.

The first two times in a sliding window of three months a user uses profanity they'll receive a warning. Starting from the third time, the user will get muted. At first users will be muted for five minutes, with an exponentially increasing duration up to one week for each continued use of profanity afterwards.

To enable this functionality the --enable-profanity-monitoring command line option has to be provided.

This change requires a database migration for existing databases.
The following SQL-commands can be used for that:

DROP TABLE profanity_whitelist;

CREATE TABLE profanity_terms (
  term VARCHAR(255) NOT NULL,
  language VARCHAR(2) NOT NULL,
  PRIMARY KEY (term, language)
);

INSERT INTO
  profanity_terms (term, language)
SELECT
  word AS term,
  '["en"]'
FROM
  profanity_blacklist;

DROP TABLE profanity_blacklist;

ALTER TABLE profanity_incidents
RENAME TO profanity_incidents_old;

CREATE TABLE profanity_incidents (
  id INTEGER NOT NULL,
  timestamp DATETIME NOT NULL,
  player VARCHAR(255) NOT NULL,
  room VARCHAR(255) NOT NULL,
  offending_content TEXT NOT NULL,
  detected_languages JSON NOT NULL,
  matched_terms JSON NOT NULL,
  PRIMARY KEY (id)
);

INSERT INTO
  profanity_incidents
SELECT
  id,
  timestamp,
  player,
  '[email protected]',
  offending_content,
  '[]',
  '[]'
FROM
  profanity_incidents_old
WHERE
  deleted != '1';

DROP TABLE profanity_incidents_old;

@rendello
Copy link

Two false positives I found in testing:

fr J'étais en retard avec ma cavalerie

es Eso puede retardar los romanos

@rendello
Copy link

If you name your player an insult you can get the moderation bot to kick the ratings bot. This was fun to test 😆
Screenshot 2024-09-27 at 3 32 26 PM

@rossenburgg
Copy link
Contributor

If you name your player an insult you can get the moderation bot to kick the ratings bot. This was fun to test 😆 Screenshot 2024-09-27 at 3 32 26 PM

Perhaps we could simply exclude filtering for the specific JID associated with the other bot ?@Dunedan

@Dunedan
Copy link
Collaborator Author

Dunedan commented Sep 30, 2024

Thanks for reporting these issues.

Two false positives I found in testing:

fr J'étais en retard avec ma cavalerie

es Eso puede retardar los romanos

While this looks like the reason for these two false-positives might have been the same one, it's actually been two different reasons.

For the French sentence it was because the bot always checked the English profanity terms as well, in addition to the ones in the detected language. I changed that now, so it doesn't check English ones anymore if it detects at least one other language with 100% certainty. That won't fix all of such false-positives, but should produce much fewer of them.

The Spanish sentence was caused by a bug in the detection of profanity in phrases, which caused partial words to get matched.

If you name your player an insult you can get the moderation bot to kick the ratings bot.

I already had thought about this case when implementing the functionality and the intention was to not punish users for writing other users names, even if these names contain profanity. However, there was a bug in the implementation so it only checked the usernames against the lemmatized words written. That meant the moderation bot would detect EcheLOn writing "fuck" and not finding a player with the same name.

All of these issues should be fixed now, but I'd appreciate further testing.

@rendello
Copy link

rendello commented Oct 1, 2024

Glad to help. Me and Norse_Herold had been talking around with profanity monitoring so I had a few test cases in mind.

This adds functionality to automatically check for profanity in text
messages written in any of the XMPP MUC rooms monitored by the
moderation bot.

The terms being considered profanity can be configured using the
database and are language specific. They have to be stored in
their lemmatized form. If a supported language gets detected with an
accuracy of 100% only terms for that language will be checked, otherwise
English terms will be checked as well. Supported languages for now are
English, French, German, Polish, Portuguese, Russian, Spanish and
Turkish.

For the first two times in a sliding window of three months a user uses
profanity they'll receive a warning. Starting from the third time,
the user will get muted. At first users will be muted for five minutes,
with an exponentially increasing duration up to one week for each
continued use of profanity afterwards.

To enable this functionality the `--enable-profanity-monitoring`
command line option has to be provided.
@Dunedan Dunedan force-pushed the profanity-monitoring branch from 79f52f5 to 1ed99e3 Compare October 18, 2024 12:30
@Dunedan Dunedan merged commit a5d77d1 into 0ad:master Oct 18, 2024
3 checks passed
@Dunedan Dunedan deleted the profanity-monitoring branch October 18, 2024 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants