-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automatic checking for profanity #45
Conversation
Two false positives I found in testing:
|
Perhaps we could simply exclude filtering for the specific JID associated with the other bot ?@Dunedan |
Thanks for reporting these issues.
While this looks like the reason for these two false-positives might have been the same one, it's actually been two different reasons. For the French sentence it was because the bot always checked the English profanity terms as well, in addition to the ones in the detected language. I changed that now, so it doesn't check English ones anymore if it detects at least one other language with 100% certainty. That won't fix all of such false-positives, but should produce much fewer of them. The Spanish sentence was caused by a bug in the detection of profanity in phrases, which caused partial words to get matched.
I already had thought about this case when implementing the functionality and the intention was to not punish users for writing other users names, even if these names contain profanity. However, there was a bug in the implementation so it only checked the usernames against the lemmatized words written. That meant the moderation bot would detect EcheLOn writing "fuck" and not finding a player with the same name. All of these issues should be fixed now, but I'd appreciate further testing. |
Glad to help. Me and Norse_Herold had been talking around with profanity monitoring so I had a few test cases in mind. |
This adds functionality to automatically check for profanity in text messages written in any of the XMPP MUC rooms monitored by the moderation bot. The terms being considered profanity can be configured using the database and are language specific. They have to be stored in their lemmatized form. If a supported language gets detected with an accuracy of 100% only terms for that language will be checked, otherwise English terms will be checked as well. Supported languages for now are English, French, German, Polish, Portuguese, Russian, Spanish and Turkish. For the first two times in a sliding window of three months a user uses profanity they'll receive a warning. Starting from the third time, the user will get muted. At first users will be muted for five minutes, with an exponentially increasing duration up to one week for each continued use of profanity afterwards. To enable this functionality the `--enable-profanity-monitoring` command line option has to be provided.
79f52f5
to
1ed99e3
Compare
This adds functionality to automatically check for profanity in text messages written in any of the XMPP MUC rooms monitored by the moderation bot.
The terms being considered profanity can be configured using the database and are language specific. They have to be stored in their lemmatized form. English terms will always be checked, in addition, if a supported language other than English is detected, the terms configured for that language are being checked as well. Supported languages for now are English, French, German, Portuguese, Russian, Spanish and Turkish.
The first two times in a sliding window of three months a user uses profanity they'll receive a warning. Starting from the third time, the user will get muted. At first users will be muted for five minutes, with an exponentially increasing duration up to one week for each continued use of profanity afterwards.
To enable this functionality the
--enable-profanity-monitoring
command line option has to be provided.This change requires a database migration for existing databases.
The following SQL-commands can be used for that: