-
Notifications
You must be signed in to change notification settings - Fork 5
/
prompts.py
403 lines (289 loc) · 18.2 KB
/
prompts.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
SYSTEM_PROMPT = """\
You are an expert in python for software engineering and code review. Your responsibility is to review the patches generated by language models to fix some issues and provide feedback on the quality of their code.
"""
USER_PROMPT_TEMPLATE = """\
You are given an issue text on a github repository (wrapped with <issue_description></issue_description>), along with the whole repository's overview (wrapped with <repomap></repomap>).
You are also given a candidate patch (wrapped with <patch></patch>) that tries to resolve the target issue.
In the repomap overview, there are the paths to some important files in the repository, after each path there are the signatures of some important classes and functions.
Please help me evaluate this candidate patch, give me an integer score (ranging from 0 to 10) to indicate the correctness of the given patch, higher score means better quality.
<issue_description>
{issue_text}
</issue_description>
<repomap>
{repo_map}
</repomap>
<patch>
{model_patch}
</patch>
Please first try to explain what the patch is doing and why this may or may not solve the issue in <explanation></explanation> tag.
If the patch seems invalid to you or there is something wrong, please give a score of -1.
Then tell me your score based on your explanation, wrap it in <score></score> tags.
"""
USER_PROMPT_TEMPLATE_BEFORE_AFTER = """\
You are given an issue text on a github repository (wrapped with <issue_description></issue_description>), along with the whole repository's overview (wrapped with <repomap></repomap>).
You are also given the changed made by a candidate patch that tries to resolve the target issue.
For your convenience, you are given the hunks of original code and the code after applying the patch.
The code before the patch is wrapped with <before_patch></before_patch> and the code after the patch is wrapped with <after_patch></after_patch>.
Note that the file names in before_patch starts with 'a/' and the file names in after_patch starts with 'b/'.
Also, to help you get a fuller view of the patch, they are given within a context of 20 lines before and after the patched lines.
In the repomap overview, there are the paths to some important files in the repository, after each path there are the signatures of some important classes and functions.
Please help me evaluate this candidate patch, give me an integer score (ranging from 0 to 10) to indicate the correctness of the given patch.
<issue_description>
{issue_text}
</issue_description>
<repomap>
{repo_map}
</repomap>
<before_patch>
{before_patch}
</before_patch>
<after_patch>
{after_patch}
</after_patch>
Please first try to explain what the patch is doing and why this may or may not solve the issue in <explanation></explanation> tag, make sure to address whether the patch is fixing the correct function or not in your explanation.
If the patch seems invalid to you or there is something wrong, please give a score of -1.
Then tell me your score based on your explanation, wrap it in <score></score> tags.
"""
PARIWISE_COMPARISON_WITH_IDENTIFIED_SPANS_TEMPLATE = """\
I want you to compare two LLM-generated candidate patches that try to resolve an issue in a codebase.
To assist you in this task, you are provided with the following information:
- You are given an issue text on a github repository (wrapped with <issue_description></issue_description>).
- You are also given some identified code spans that are relevant to the issue.
Each code span is wrapped with <code_span file_path=FILE_PATH span_id=SPAN_ID></code_span> tags, where FILE_PATH is the path to the file containing the code span, and SPAN_ID is the unique identifier for the code span.
Each code span also comes with the line numbers for you to better understand the context.
- You are given two candidate patches that try to resolve the target issue.
The first candidate patch is wrapped with <patch1></patch1> tags and the second candidate patch is wrapped with <patch2></patch2> tags.
Within each patch, you are given the hunks of original code and the code after applying the patch.
The code before the patch is wrapped with <before_patch1></before_patch1> and <before_patch2></before_patch2> tags, and the code after the patch is wrapped with <after_patch1></after_patch1> and <after_patch2></after_patch2> tags.
Note that the file names in before_patch starts with 'a/' and the file names in after_patch starts with 'b/'.
- At least one of the two patches is correct.
Here's what you want to do:
1. Understand the issue. Explain in your own words what the issue is about. Output your explanation in <issue_exp></issue_exp> tags.
2. Understand the identified code spans. First provide a list of the span ids. Then explain how each of the identified code spans are relevant to the issue. Output your explanation in <code_span_exp></code_span_exp> tags.
3. Understand the candidate patches. First curate a list of modified hunks. For each modified hunk, explain what it's doing. Output your explanation in the <patch_exp_1></patch_exp_1> and <patch_exp_2></patch_exp_2> fields.
4. Check if the patches are introducing any new issues, especially if it contradicts with any of the identified code spans. Output your explanation in the <new_issues_exp1></new_issues_exp1> and <new_issues_exp2></new_issues_exp2> fields.
5. Check if the patches can fix the issue. Compare the generated patch agains the common mistakes made by LLMs and see if it falls into any of the categories. Output your explanation in the <fix_issue_exp></fix_issue_exp> field.
6. Point out the differences between the two patches and how these may affect their correctness. Expliclty point out how big the difference is. Refer back to your <patch_exp> explanations when pointing out the difference. Output your explanation in the <diff></diff> field.
7. Explain you choice of the better patch according to the following rubrics. Make sure to repeat issue description in your own words first when explaining. Output your explanation in the <better_patch_exp></better_patch_exp> field.
8. Finally, give me your choice of the better patch. Wrap your choice in <better_patch></better_patch> tags. Your choice should be 0 or 1 or 2, where 0 means you cannot pick a better one, 1 means the first patch is better and 2 means the second patch is better.
Here are your inputs:
<issue_description>
{issue_text}
</issue_description>
{code_spans}
<patch1>
<before_patch1>
{before_patch1}
</before_patch1>
<after_patch1>
{after_patch1}
</after_patch1>
</patch1>
<patch2>
<before_patch2>
{before_patch2}
</before_patch2>
<after_patch2>
{after_patch2}
</after_patch2>
</patch2>
Again, make sure your output ends with <better_patch></better_patch> tags containing only 1 or 2, indicating your choice of the better patch.
For example, if you think the first patch is better, the final part of output should look like this:
<better_patch>1</better_patch>
It should not contain any other information or characters.
Do not use ``` or ### or anything else to wrap your verdict.
"""
SINGLE_SCORING_WITH_IDENTIFIED_SPANS_TEMPLATE = """\
I want you to evaluate an LLM-generated candidate patch that tries to resolve an issue in a codebase.
To assist you in this task, you are provided with the following information:
- You are given an issue text on a github repository (wrapped with <issue_description></issue_description>).
- You are also given some identified code spans that are relevant to the issue.
Each code span is wrapped with <code_span file_path=FILE_PATH span_id=SPAN_ID></code_span> tags, where FILE_PATH is the path to the file containing the code span, and SPAN_ID is the unique identifier for the code span.
Each code span also comes with the line numbers for you to better understand the context.
- You are given the candidate patch that tries to resolve the target issue.
For your convenience, you are given the hunks of original code and the code after applying the patch.
The code before the patch is wrapped with <before_patch></before_patch> and the code after the patch is wrapped with <after_patch></after_patch>.
Note that the file names in before_patch starts with 'a/' and the file names in after_patch starts with 'b/'.
Here's what you want to do:
1. Understand the issue. Explain in your own words what the issue is about. Output your explanation in <issue_exp></issue_exp> tags.
2. Understand the identified code spans. First provide a list of the span ids. Then explain how each of the identified code spans are relevant to the issue. Output your explanation in <code_span_exp></code_span_exp> tags.
3. Understand the candidate patch. First curate a list of modified hunks. For each modified hunk, explain what it's doing. Output your explanation in the <patch_exp></patch_exp> field.
4. Check if the patch is fixing the correct function or not. Output your explanation in the <correct_location_exp></correct_location_exp> field.
5. Check if the patch is introducing any new issues, especially if it contradicts with any of the identified code spans. Output your explanation in the <new_issues_exp></new_issues_exp> field.
6. Check if the patch can fix the issue. Compare the generated patch agains the common mistakes made by LLMs and see if it falls into any of the categories. Be ruthless to point out any potential mistakes. Output your explanation in the <fix_issue_exp></fix_issue_exp> field.
7. Finally, give me your score. Wrap your score in <score></score> tags. Make sure to include in these tags only an integer, nothing else.
Here's the scoring rubric:
Your score should be an integer between 0 and 10, where higher scores indicate better quality.
You should give a score of -1 if you think the patch is invalid or there is something wrong with it.
For every contradiction between the identified code spans and the patch, you should deduct 1 point from the score.
If you think the patch is not fixing the correct function, you should give a 0.
If you think the patch is introducing new issues, you should deduct 2 points from the score.
Your scoring should only be about the correctness of the patch, not about its quality or style.
<issue_description>
{issue_text}
</issue_description>
<before_patch>
{before_patch}
</before_patch>
<after_patch>
{after_patch}
</after_patch>
{code_spans}
Again, make sure your output ends with <score></score> tags containing only an integer.
For example, if your score is 8, the final part of output should look like this:
<score>8</score>
It should not contain any other information or characters.
Do not use ``` or ### or anything else to wrap your score.
"""
EXPLANATION_PROMPT = """\
You are given an issue text on a github repository (wrapped with <issue_description></issue_description>), along with the whole repository's overview (wrapped with <repomap></repomap>).
In the repomap overview, there are the paths to some important files in the repository, after each path there are the signatures of some important classes and functions.
You are also given a candidate patch (wrapped with <patch></patch>) that tries to resolve the target issue.
For your convenience, you are also given the hunks of code before and after the patch. The code before the patch is wrapped with <before_patch></before_patch> and the code after the patch is wrapped with <after_patch></after_patch>.
Please help me explain what this candidate patch, and point out some of the key differences between the code before and after the patch.
Please also point out any potential problems if any. Note that the patch could be written by a rookie coder and contain many mistakes.
Note that your explanation, differences, and potential problems should be solely based on the correctness of the patch, not on its quality or style.
Your focus should be solely on the correctness of the patches, not on their quality or style.
Make sure the patch solves and only solves the issue, and does not introduce any new issues.
Make sure the patch is not redundant and does not contain any unnecessary changes.
In conclusion, your response should be a function call to the `explain` function with the following arguments:
{{
"explanation": "Your explanation for the patch",
"differences": "Your explanation for the difference between the code before and after the patch",
"problems": "Any potential problems with the patch, empty if none",
}}
<issue_description>
{issue_text}
</issue_description>
<repomap>
{repo_map}
</repomap>
<patch>
{model_patch}
</patch>
<before_patch>
{before_patch}
</before_patch>
<after_patch>
{after_patch}
</after_patch>
"""
PAIRWISE_USER_PROMPT = """\
You are given an issue text on a github repository (wrapped with <issue_description></issue_description>), along with the whole repository's overview (wrapped with <repomap></repomap>).
In the repomap overview, there are the paths to some important files in the repository, after each path there are the signatures of some important classes and functions.
Please help me determine which one of the two following patches (wrapped in <patch1></patch1> and <patch2></patch2> tags) is able to resolve the target issue.
Your focus should be solely on the correctness of the patches, not on their quality or style.
Make sure the patch solves and only solves the issue, and does not introduce any new issues.
Make sure the patch is not redundant and does not contain any unnecessary changes.
For your convenience, you are also given the hunks of code before and after the patces.
The code before first patch is wrapped with <before_patch1></before_patch1> and the code after the patch is wrapped with <after_patch1></after_patch1>.
The code before second patch is wrapped with <before_patch2></before_patch2> and the code after the patch is wrapped with <after_patch2></after_patch2>.
Here's what you want to do:
1. Read the issue description and the repomap overview.
2. Explain what you think the first patch is doing. Output your explanation in the `exp1` field.
3. Explain what you think about the second patch is doing. Output your explanation in the `exp2` field.
4. Explain what you think their difference is. Output your explanation in the `diff` field.
5. Finally, give me your verdicts on which patch is better. Wrap your scores on the two patches in `score1` and `score2`. Each score must be either 1 or -1 where 1 means good and -1 means bad.
In conclusion, you should call the `compare` function with the following arguments:
{{
"exp1": "Your explanation for the first patch",
"exp2": "Your explanation for the second patch",
"diff": "Your explanation for the difference between the two patches",
"score1": 1 or -1,
"score2": 1 or -1,
}}
ALWAYS respond with values for all parameters in this tool. If you do not have an opinion on a particular parameter, please provide an empty string.
Some tips to help you evaluate the patches:
If they are very different, their scores should different.
If they are similar, their scores should be the same.
If both are good, give them both a score of 1. If both are bad (or invalid or empty), give them both a score of -1.
<issue_description>
{issue_text}
</issue_description>
<repomap>
{repo_map}
</repomap>
<patch1>
{patch1}
</patch1>
<before_patch1>
{before_patch1}
</before_patch1>
<after_patch1>
{after_patch1}
</after_patch1>
<patch2>
{patch2}
</patch2>
<before_patch2>
{before_patch2}
</before_patch2>
<after_patch2>
{after_patch2}
</after_patch2>
"""
PAIRWISE_USER_PROMPT_GIVEN_EXP = """\
You are given an issue text on a github repository (wrapped with <issue_description></issue_description>), along with the whole repository's overview (wrapped with <repomap></repomap>).
In the repomap overview, there are the paths to some important files in the repository, after each path there are the signatures of some important classes and functions.
Please help me determine which one of the two following patches (wrapped in <patch1></patch1> and <patch2></patch2> tags) is able to resolve the target issue.
Along with the patches, you are given some auxiliary information about the patches:
- The explanation for the two patches are given in <exp1></exp1> and <exp2></exp2> tags.
- The potential problems with the two patches are given in <problems1></problems1> and <problems2></problems2> tags.
Your focus should be solely on the correctness of the patches, not on their quality or style.
Make sure the patch solves and only solves the issue, and does not introduce any new issues.
Make sure the patch is not redundant and does not contain any unnecessary changes.
For your convenience, you are also given the hunks of code before and after the patces.
The code before first patch is wrapped with <before_patch1></before_patch1> and the code after the patch is wrapped with <after_patch1></after_patch1>.
The code before second patch is wrapped with <before_patch2></before_patch2> and the code after the patch is wrapped with <after_patch2></after_patch2>.
Here's what you want to do:
1. Read the issue description and the repomap overview.
2. Read the explanations and potential problems with the patches. Reason about the correctness of the patches and how they compare to each other, output your comparison in <compare></compare> tag.
5. Finally, give me your verdicts on which patch is better. Wrap your scores on the two patches in `score1` and `score2`. Each score must be either 1 or -1 where 1 means good and -1 means bad.
In conclusion, you should call the `compare` function with the following arguments:
{{
"compare": "Your comparison of the two patches",
"score1": 1 or -1,
"score2": 1 or -1,
}}
ALWAYS respond with values for all parameters in this tool. If you do not have an opinion on a particular parameter, please provide an empty string.
Some tips to help you evaluate the patches:
If they are very different, their scores should different.
If they are similar, their scores should be the same.
If both are good, give them both a score of 1. If both are bad (or invalid or empty), give them both a score of -1.
<issue_description>
{issue_text}
</issue_description>
<repomap>
{repo_map}
</repomap>
<patch1>
{patch1}
</patch1>
<before_patch1>
{before_patch1}
</before_patch1>
<after_patch1>
{after_patch1}
</after_patch1>
<exp1>
{exp1}
</exp1>
<problems1>
{problems1}
</problems1>
<patch2>
{patch2}
</patch2>
<before_patch2>
{before_patch2}
</before_patch2>
<after_patch2>
{after_patch2}
</after_patch2>
<exp2>
{exp2}
</exp2>
<problems2>
{problems2}
</problems2>
"""