Content Moderation #56

Keyrxng · 2024-11-02T12:14:50Z

I've suggested a version of this before but I think it should be given another thought and re-evaluated.

Looks at this:

I luckily caught the message after only a minute or so but in the event that this was not captured immediately (not ana in particular but any contributor within any partner org) and the contributor did fall for some phishing attempt or whatever it may be, it's a baaaad look for us when the solution is pretty simple.

Build a content moderation plugin that can be configured for different sorts of content moderation, focusing on the highest priority few for V1.

The obvious option is to use a v cheap LLM and moderate every comment that comes through. Comments could have various pre-processing to reduce LLM usage. There are specific LLMs built for this but ol' trusty GPT would do just as well I'd imagine.

I'm use sure we could use a non-LLM approach and NLP it but unsure of the implementation details

0x4007 · 2024-11-02T18:51:44Z

It's a bit vague how exactly this is supposed to moderate that out. What's the prompt look like?

Keyrxng · 2024-11-02T19:02:55Z

It's a bit vague how exactly this is supposed to moderate that out. What's the prompt look like?

There are specific models built for it but GPT would kill it still.
pre-processing would need iterated on but things like hasCommented || isDbUser that kind of thing
The prompt we'd need to iterate on too but something to the effect of

You are tasked with detecting fraudulent and/or malicious interactions via GitHub comments.
Primarily you are to assert that no comment ever contains:

a phishing attempt
...

${orgName} supports the following:

ticketing system only ever through the exact match url: ${config.ticketingUrl}
${...config.onlySupportsStrings}

${orgName} will never:

${...config.strings || config.defaults} ask for a user' password or to reach out via anything other than official urls

${...config.officialUrls}

We can make the prompt practically built off of the strings the pass in via the config or they build the whole prompt themselves if they want? We can iterate on a solid outcome prompt with config variables at first

devpool-directory-superintendent bot mentioned this issue Nov 2, 2024

Content Moderation ubiquity/devpool-directory#1837

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content Moderation #56

Content Moderation #56

Keyrxng commented Nov 2, 2024

0x4007 commented Nov 2, 2024

Keyrxng commented Nov 2, 2024