You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've suggested a version of this before but I think it should be given another thought and re-evaluated.
Looks at this:
I luckily caught the message after only a minute or so but in the event that this was not captured immediately (not ana in particular but any contributor within any partner org) and the contributor did fall for some phishing attempt or whatever it may be, it's a baaaad look for us when the solution is pretty simple.
Build a content moderation plugin that can be configured for different sorts of content moderation, focusing on the highest priority few for V1.
The obvious option is to use a v cheap LLM and moderate every comment that comes through. Comments could have various pre-processing to reduce LLM usage. There are specific LLMs built for this but ol' trusty GPT would do just as well I'd imagine.
I'm use sure we could use a non-LLM approach and NLP it but unsure of the implementation details
The text was updated successfully, but these errors were encountered:
It's a bit vague how exactly this is supposed to moderate that out. What's the prompt look like?
There are specific models built for it but GPT would kill it still.
pre-processing would need iterated on but things like hasCommented || isDbUser that kind of thing
The prompt we'd need to iterate on too but something to the effect of
You are tasked with detecting fraudulent and/or malicious interactions via GitHub comments.
Primarily you are to assert that no comment ever contains:
a phishing attempt
...
${orgName} supports the following:
ticketing system only ever through the exact match url: ${config.ticketingUrl}
${...config.onlySupportsStrings}
${orgName} will never:
${...config.strings || config.defaults} ask for a user' password or to reach out via anything other than official urls
${...config.officialUrls}
We can make the prompt practically built off of the strings the pass in via the config or they build the whole prompt themselves if they want? We can iterate on a solid outcome prompt with config variables at first
I've suggested a version of this before but I think it should be given another thought and re-evaluated.
Looks at this:
I luckily caught the message after only a minute or so but in the event that this was not captured immediately (not ana in particular but any contributor within any partner org) and the contributor did fall for some phishing attempt or whatever it may be, it's a baaaad look for us when the solution is pretty simple.
Build a content moderation plugin that can be configured for different sorts of content moderation, focusing on the highest priority few for V1.
The obvious option is to use a v cheap LLM and moderate every comment that comes through. Comments could have various pre-processing to reduce LLM usage. There are specific LLMs built for this but ol' trusty GPT would do just as well I'd imagine.
I'm use sure we could use a non-LLM approach and NLP it but unsure of the implementation details
The text was updated successfully, but these errors were encountered: