Why are the experimental results different? #84

zl-comment · 2024-09-18T02:05:20Z

bertattack

original prompt: Evaluate the sentiment of the given text and classify it as 'positive' or 'negative':
original score: 0.4934426229508197
attacked prompt: Evaluate the sеntiment of the given text and classification it as 'positive' or 'negative':
attacked score: 0.4852459016393443
PDR: 0.01661129568106314

original prompt: Given the context of this text, indicate if the emotion conveyed is 'positive' or 'negative':
original score: 0.5
attacked prompt: Given the context of this text, indicatе if the emoton conveyed is 'positive' or 'negative':
attacked score: 0.4737704918032787
PDR: 0.05245901639344264

original prompt: Analyze the tone of this statement and respond with either 'positive' or 'negative':
original score: 0.5180327868852459
attacked prompt: Anaⅼyze the tone of this statement and respond with nor 'positive' or 'negative':
attacked score: 0.460655737704918
PDR: 0.1107594936708861

original prompt: In the role of a sentiment analysis tool, respond with 'positive' or 'negative' to classify this statement:
original score: 0.5229508196721312
attacked prompt: In the role of a sentiment analyse instrumentation, respond with 'positive' or 'negative' to classify this statemen𝚝:
attacked score: 0.4639344262295082
PDR: 0.11285266457680258

original prompt: Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify:
original score: 0.5180327868852459
attacked prompt: Functionіng as a sentiment identіfication tool, assess if the following expression is 'positive' or 'negative'. Classify:
attacked score: 0.4639344262295082
PDR: 0.10443037974683543

original prompt: Serving as a sentiment evaluation model, determine if the given statement is 'positive' or 'negative'. Classify:
original score: 0.5278688524590164
attacked prompt: Serve as a sentimeոt evaluation modelling, determine if the given statement is 'positive' or 'negative'. Classify:
attacked score: 0.4721311475409836
PDR: 0.10559006211180119

The original accuracy rate using the prompt words in the paper is only about 50%, but the original accuracy rate in the source code is about 90%.
However, the original accuracy rate of prompt words using the sample codehttps://github.com/microsoft/promptbench/blob/main/examples/prompt_attack.ipynb is more than 90%

bertattack

'original prompt': "Evaluate the sentiment of the given text and classify it as 'positive' or 'negative':\nQuestion: {content}\nAnswer:"
'original score': 0.9459016393442623
'attacked prompt': "Evaluate the sincerity of the received text and rate it as 'positive' or 'negative':\ncuriosity: {content}\nAnswer:"
'attacked score': 0.9147540983606557
'PDR': 0.03292894280762571(0.9147540983606557~0.9557377049180328)

'original prompt': "Given the context of this text, indicate if the emotion conveyed is 'positive' or 'negative':\nQuestion: {content}\nAnswer:"
'original score': 0.9393442622950819
'attacked prompt': "i the context of this text, indicate if the emotion conveyed is 'positive' or 'negative':\nQuestion: {content}\nfield:"
'attacked score': 0.9229508196721311
'PDR': 0.017452006980802813(0.9229508196721311~)

'original prompt': "Given the context of this text, indicate if the emotion conveyed is 'positive' or 'negative':\nQuestion: {content}\nAnswer:"
'original score': 0.9442622950819672
'attacked prompt': "Given the clarity of this section, specify if the emotion depicted is 'positive' or 'negative':\ncuriosity: {content}\nAnswer:"
'attacked score': 0.8885245901639345
'PDR': 0.05902777777777775

'original prompt': "Analyze the tone of this statement and respond with either 'positive' or 'negative':\nQuestion: {content}\nAnswer:"
'original score': 0.9475409836065574
'attacked prompt': "Analyze the syntax of this phrase and address with meaning 'positive' or 'negative':\nriddle: {content}\nAnswer:"
'attacked score': 0.6081967213114754
'PDR': 0.35813148788927335

'original prompt': "In the role of a sentiment analysis tool, respond with 'positive' or 'negative' to classify this statement:\nQuestion: {content}\nAnswer:"
'original score': 0.9540983606557377
'attacked prompt': "in the role of a sentiment analysis game, respond with 'positive' or 'negative' to classify this function:\ncuriosity: {content}\nquestions:"
'attacked score': 0.8918032786885246
'PDR': 0.06529209621993128

'original prompt': "Functioning as a sentiment identification tool, assess if the following expression is 'positive' or 'negative'. Classify:\nQuestion: {content}\nAnswer:"
'original score': 0.9475409836065574
'attacked prompt': "Functioning as a emotion correlation tool, study if the following expression is 'positive' or 'negative'. find:\ncuriosity: {content}\nAnswer:"
'attacked score': 0.8918032786885246
'PDR': 0.05882352941176468

'original prompt': "Serving as a sentiment evaluation model, determine if the given statement is 'positive' or 'negative'. Classify:\nQuestion: {content}\nAnswer:"
'original score': 0.9491803278688524
'attacked prompt': "which as a sentiment evaluation models, determine if the given words is 'positive' or 'negative'. model:\ncorrectly: {content}\nasking:"
'attacked score': 0.8885245901639345
'PDR': 0.06390328151986174

Immortalise · 2024-09-20T15:53:47Z

Hi, could you please indicate which model you are using for the attack? The difference may arise from the use of a different model.

zl-comment · 2024-09-22T07:52:09Z

model='google/flan-t5-large'

Immortalise · 2024-09-29T23:25:16Z

Could you please check and compare the results here? In this website, the results for T5 in SST-2 dataset is around 95%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are the experimental results different? #84

Why are the experimental results different? #84

zl-comment commented Sep 18, 2024 •

edited

Loading

Immortalise commented Sep 20, 2024

zl-comment commented Sep 22, 2024

Immortalise commented Sep 29, 2024

Why are the experimental results different? #84

Why are the experimental results different? #84

Comments

zl-comment commented Sep 18, 2024 • edited Loading

bertattack

bertattack

Immortalise commented Sep 20, 2024

zl-comment commented Sep 22, 2024

Immortalise commented Sep 29, 2024

zl-comment commented Sep 18, 2024 •

edited

Loading