FEAT: Supporting the new tongue tied Gandalf levels #356

donebydan · 2024-09-03T18:19:46Z

Description

This PR adds support for new Gandalf levels (namely Tongue Tied) released on 29th August 2024. This requires a custom scorer for this purpose, since it is not a password finder.

Co-authored with @s-zanella

Tests and Documentation

Tests added for target and scorer in line with previous Gandalf modules.
A notebook with an ad-hoc orchestrator for the Tongue Tied levels which works for all 5 levels. It passes levels 1 to 3 within a 1-30 queries (only level 1 prompt provided).

pyrit/prompt_target/gandalf_target.py

… Gandalf levels

pyrit/prompt_target/gandalf_target.py

pyrit/score/gandalf_scorer.py

…equest` to create normalizer requests

…rer and target, works for all 5 levels. Passes levels 1 to 3 within a 1-30 queries (only level 1 prompt provided).

rlundeen2 · 2024-09-05T16:18:12Z

doc/code/targets/gandalf_tongue_tied.ipynb

+   "source": [
+    "## Orchestrator\n",
+    "\n",
+    "We will build our own simple orchestrator that asks the Red Teaming model to produce a prompt and refine it at every turn, passing it the conversation so far with feedback from the scorer at each turn and uses chain-of-thought reasoning to refine the prompt. This is a streamlined sequential version of the PAIR orchestrator in [pair_orchestrator.py](../../../pyrit/orchestrator/pair_orchestrator.py), though the idea is older and can be traced back to other attempts at solving Gandalf challenges like [LLMFuzzAgent](https://github.com/corca-ai/LLMFuzzAgent) and [Gandalf vs. Gandalf](https://github.com/microsoft/gandalf_vs_gandalf)."


Why do this vs use actual PAIR orchestrator (or TAP, or other)

A silly reason is that the PAIR orchestrator had a typo and was misbehaving (this PR also fixes it), but also that I wanted to start with a streamlined orchestrator that I understood. It wasn't entirely clear to me how to parse and reformat responses to give more direct feedback to the attacker (writing an ad-hoc scorer thar provides that feedback and modifying the PAIR orchestrator to use it?).

Ultimately, I think we do want to use (and maybe generalize) the PAIR orchestrator to do this, and switch to a float_scale scorer that counts the number of successful LLM responses (in level 5, the same prompt needs to trick 3 different LLMs).

This is the only thing I'm hesitant on before merging. Because we run these notebooks to test, I would rather not have this "new orchestrator" as a dependency as we release. Can you update to use PAIR (even if imperfectly)? You could also use TAP which is more robust and works identically if configured right.

I'll give it a try. A risk is that the glue code might end up being longer and harder to maintain than an ad-hoc orchestrator.

Glue code is still useful to us, because we want these orchestrators to be as generic as possible; so if there are pain points we want to make it easier if we can :)

doc/code/targets/gandalf_tongue_tied.ipynb

pyrit/score/true_false_inverter_scorer.py

pyrit/score/gandalf_scorer.py

rlundeen2

Waiting on the orchestrator updates; ping the team when it's ready for a re-review

jonesdaniel added 4 commits September 3, 2024 16:40

Adding support for first two levels of Gandalf Tongue Tied

a241984

Merge branch 'main' into tongue-tied/two-levels

e3deeb2

adding tests for tongue tied scorer and removing duplicate target

0e1f325

removing stray breakpoint

ad84fac

romanlutz reviewed Sep 3, 2024

View reviewed changes

pyrit/prompt_target/gandalf_target.py Outdated Show resolved Hide resolved

elgertam and others added 3 commits September 5, 2024 11:59

FEAT Add SQL Entra Auth for Azure SQL Server (Azure#330)

f236ba8

[MAINT] Fix typos in OllamaChatTarget (Azure#357)

3d61482

Reuse original Gandalf target and use built-in scorer for Tongue Tied…

b4121e7

… Gandalf levels

s-zanella reviewed Sep 5, 2024

View reviewed changes

pyrit/prompt_target/gandalf_target.py Show resolved Hide resolved

s-zanella reviewed Sep 5, 2024

View reviewed changes

pyrit/score/gandalf_scorer.py Outdated Show resolved Hide resolved

s-zanella and others added 6 commits September 5, 2024 13:09

Fix typo

8b575b2

Use correct target for the attacker. Use method `_create_normalizer_r…

ed68c14

…equest` to create normalizer requests

Remove unused threshold parameter in TrueFalseInverterScore

89ac0af

Tongue Tied Gandalf notebook (WIP): ad-hoc orchestrator, built-in sco…

10b5e30

…rer and target, works for all 5 levels. Passes levels 1 to 3 within a 1-30 queries (only level 1 prompt provided).

resolve conflict from origin

7518705

Merge branch 'main' into tongue-tied/two-levels

87c1eef

donebydan changed the title ~~FEAT: Supporting (the first two) tongue tied Gandalf levels~~ FEAT: Supporting the new tongue tied Gandalf levels Sep 5, 2024

donebydan requested a review from romanlutz September 5, 2024 14:05

s-zanella added 2 commits September 5, 2024 15:44

Prompt that solves level 2

7bbe9ed

gandalf_tongue_tied_scorer -> GandalfTongueTiedScorer

9b01377

rlundeen2 reviewed Sep 5, 2024

View reviewed changes

doc/code/targets/gandalf_tongue_tied.ipynb Show resolved Hide resolved

rlundeen2 reviewed Sep 5, 2024

View reviewed changes

pyrit/score/true_false_inverter_scorer.py Show resolved Hide resolved

rlundeen2 reviewed Sep 5, 2024

View reviewed changes

pyrit/score/gandalf_scorer.py Outdated Show resolved Hide resolved

s-zanella and others added 3 commits September 5, 2024 18:24

Line endings

ba9a36a

Add pct file for notebook. Prompt that solves level 3

87fcb31

Merge branch 'main' into tongue-tied/two-levels

54267c1

donebydan requested a review from rlundeen2 September 7, 2024 11:03

s-zanella and others added 2 commits September 11, 2024 06:08

Merge branch 'main' into tongue-tied/two-levels

1611538

Simplify answer parsing; fix mypy type errors; trim whitespace

bc7fa49

rlundeen2 requested changes Sep 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Supporting the new tongue tied Gandalf levels #356

FEAT: Supporting the new tongue tied Gandalf levels #356

donebydan commented Sep 3, 2024 •

edited

Loading

rlundeen2 Sep 5, 2024

s-zanella Sep 5, 2024

rlundeen2 Sep 9, 2024

s-zanella Sep 10, 2024

rlundeen2 Sep 10, 2024

rlundeen2 left a comment

FEAT: Supporting the new tongue tied Gandalf levels #356

Are you sure you want to change the base?

FEAT: Supporting the new tongue tied Gandalf levels #356

Conversation

donebydan commented Sep 3, 2024 • edited Loading

Description

Tests and Documentation

rlundeen2 Sep 5, 2024

Choose a reason for hiding this comment

s-zanella Sep 5, 2024

Choose a reason for hiding this comment

rlundeen2 Sep 9, 2024

Choose a reason for hiding this comment

s-zanella Sep 10, 2024

Choose a reason for hiding this comment

rlundeen2 Sep 10, 2024

Choose a reason for hiding this comment

rlundeen2 left a comment

Choose a reason for hiding this comment

donebydan commented Sep 3, 2024 •

edited

Loading