forked from K-JailbreakBench/k-jailbreakbench.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
behaviors.html
155 lines (140 loc) · 7.23 KB
/
behaviors.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
<!DOCTYPE html>
<head>
<meta charset="UTF-8" />
<!-- <meta name="viewport" content="width=device-width, initial-scale=1" />-->
<meta name="viewport" content="width=1024" />
<title>JailbreakBench: LLM robustness benchmark</title>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></script>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script type="text/javascript" async
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/foundation/6.4.3/css/foundation.min.css" />
<link rel="stylesheet" href="https://cdn.rawgit.com/jpswalsh/academicons/master/css/academicons.min.css" />
<script src="https://kit.fontawesome.com/b939870cfb.js" crossorigin="anonymous"></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/foundation/6.4.3/css/foundation.min.css">
<link rel="stylesheet" href="https://cdn.datatables.net/1.10.24/css/dataTables.foundation.min.css">
<script type="text/javascript" src="https://code.jquery.com/jquery-3.5.1.js"></script>
<script type="text/javascript" src="https://cdn.datatables.net/1.10.24/js/jquery.dataTables.min.js"></script>
<script type="text/javascript" src="https://cdn.datatables.net/1.10.24/js/dataTables.foundation.min.js"></script>
<link rel="stylesheet" href="./css/main.css" />
</head>
<body>
<nav class="navbar navbar-expand-md">
<div class="container">
<a class="navbar-brand" href="./index.html"
>JailbreakBench</a>
<button
class="navbar-toggler navbar-light"
type="button"
data-toggle="collapse"
data-target="#main-navigation"
>
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="main-navigation">
<ul class="navbar-nav">
<li class="nav-item">
<a class="nav-link" href="/index#leaderboard">Leaderboards</a>
</li>
<li>
<a class="nav-link" target="_blank" href="https://arxiv.org/abs/2404.01318">Paper</a>
</li>
<li>
<a class="nav-link" target="_blank" href="https://github.com/JailbreakBench/jailbreakbench/blob/main/CONTRIBUTING.md">Contribute</a>
</li>
<li>
<a class="nav-link text-nowrap" target="_blank" href="https://github.com/JailbreakBench/jailbreakbench">Library</a>
</li>
<li>
<a class="nav-link text-nowrap" href="/behaviors">Behaviors</a>
</li>
</ul>
</div>
</div>
</nav>
<!-- <hr class="toprule" /> -->
<header>
<div class="header-block container">
<div class="logo"><img src="./images/jbb_logo_white.png" alt="logo" /></div>
<div class="title">JailbreakBench</div>
<!-- <div class="description">
A standardized benchmark for adversarial robustness
</div> -->
</div>
</header>
<!-- <hr class="toprule" /> -->
<div class="container">
<section id="introduction" style="height: 100vh;">
<p>
Here is the list of behaviors that we evaluate on the JailbreakBench leaderboard.
For each goal, we provide, the category, the source, and a short behavior description.
The behaviors include with original examples and examples sourced from
<a href="https://github.com/llm-attacks/llm-attacks/blob/main/data/advbench/harmful_behaviors.csv" target="_blank">AdvBench</a>,
the <a href="https://trojandetection.ai/" target="_blank">Trojan Detection Challenge 2023 Red Teaming Track</a>/<a href="https://www.harmbench.org/" target="_blank">HarmBench</a>,
and ideas sourced from <a href="https://arxiv.org/abs/2311.03348" target="_blank">Shah et al. (2023)</a>).
Behaviors are divided into ten broad categories corresponding to <a href="https://openai.com/policies/usage-policies" target="_blank">OpenAI's usage policies</a>.</li>
</p>
<p style="text-align: center;">⚠️ <b>DISCLAIMER:</b> some of the behaviors may be offensive to some readers. <b>Scroll down to see the table</b> ⬇️</p>
</section>
<section class="container" id="div_table_behaviors_heading">
<div class="heading">
<p class="leaderboard_title">
Behaviors
</p>
</div>
<div id="div_table_behaviors"></div>
</section>
<div class="vspace50"></div>
<section id="citation">
<div class="heading">
<p>Citation</p>
</div>
If you used the JBB-Behaviors dataset, we ask you consider citing the following works:
<div class="vspace10"></div>
<ul>
<li><a href="https://github.com/llm-attacks/llm-attacks?tab=readme-ov-file#citation" target="_blank">AdvBench</a></li>
<li><a href="https://trojandetection.ai/" target="_blank">The Trojan Detection Challenge 2023 Red Teaming Track</a>/<a href="https://github.com/centerforaisafety/HarmBench#-acknowledgements-and-citation-" target="_blank">HarmBench</a></li>
<li><a href="https://arxiv.org/abs/2311.03348" target="_blank">Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation</a> by Shah et al.</li>
</ul>
<div class="vspace10"></div>
Moreover, consider citing our whitepaper if you use the dataset, want to reference our leaderboard or if you are using our evaluation library:
<div class="vspace10"></div>
<pre><code>@misc{chao2024jailbreakbench,
title={JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models},
author={Patrick Chao and Edoardo Debenedetti and Alexander Robey and Maksym Andriushchenko and Francesco Croce and Vikash Sehwag and Edgar Dobriban and Nicolas Flammarion and George J. Pappas and Florian Tramèr and Hamed Hassani and Eric Wong},
year={2024},
eprint={2404.01318},
archivePrefix={arXiv},
primaryClass={cs.CR}
}</code></pre>
</section>
<hr class="bottomrule" />
<footer>
<small>© 2024, JailbreakBench
<!-- <a href="https://icons8.com/icon/100413/access">Icons from Icons8</a></small> -->
</footer>
<script>
// When the user scrolls the page, execute myFunction
window.onscroll = function () {
myFunction();
};
// Get the navbar
var navbar = document.getElementById("navbar");
// Get the offset position of the navbar
var sticky = navbar.offsetTop;
// Add the sticky class to the navbar when you reach its scroll position. Remove "sticky" when you leave the scroll position
function myFunction() {
if (window.pageYOffset >= sticky) {
navbar.classList.add("sticky");
} else {
navbar.classList.remove("sticky");
}
}
</script>
<script>
$("#div_table_behaviors").load("./tables/table_behaviors.html");
</script>
</body>