Skip to content

Commit

Permalink
Chain of thought prompting (#9)
Browse files Browse the repository at this point in the history
Co-authored-by: Edwin Fernando <[email protected]>
Co-authored-by: AlexShefY <[email protected]>
  • Loading branch information
3 people authored Oct 2, 2024
1 parent 573dc5c commit f266fb3
Show file tree
Hide file tree
Showing 65 changed files with 1,260 additions and 120 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ run.sh
**/.pytest_cache
run_nagini.py
/tmp
results
results/*
!results/tries_HumanEval-Dafny_1.json
!results/tries_HumanEval-RustBench_1.json
/log_tries/
/log_tries/*
.direnv
Expand Down
7 changes: 1 addition & 6 deletions gui/src/helpers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,6 @@ fn add_common_arguments<'a>(
token: &str,
settings: &Settings,
) -> &'a mut Command {
let bench_type = if settings.incremental_run {
String::from("validating")
} else {
settings.bench_type.to_string()
};
if settings.do_filter {
cmd.args(["--filter-by-ext", &settings.filter_by_ext]);
}
Expand All @@ -72,7 +67,7 @@ fn add_common_arguments<'a>(
.args(["--insert-conditions-mode", "llm-single-step"])
.args(["--llm-profile", settings.llm_profile.as_grazie()])
.args(["--grazie-token", token])
.args(["--bench-type", &bench_type])
.args(["--bench-type", &settings.bench_type.to_string()])
.args(["--tries", &make_tries(&settings.tries)])
.args(["--retries", &make_retries(&settings.retries)])
.args(["--verifier-timeout", &make_timeout(&settings.timeout)])
Expand Down
32 changes: 28 additions & 4 deletions gui/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,21 +72,23 @@ enum FileMode {
Directory,
}

#[derive(Debug, Clone, PartialEq, Default, Serialize, Deserialize)]
#[derive(Debug, Clone, Copy, PartialEq, Default, Serialize, Deserialize)]
enum BenchMode {
#[default]
Invariants,
Generic,
Generate,
Validating,
StepByStep,
}

impl BenchMode {
fn llm_generated_path(&self, path: &str) -> PathBuf {
let name = match self {
BenchMode::Invariants | BenchMode::Generic | BenchMode::Validating => {
basename(path).to_string()
}
BenchMode::Invariants
| BenchMode::Generic
| BenchMode::Validating
| BenchMode::StepByStep => basename(path).to_string(),
BenchMode::Generate => {
let base = basename(path);
base.chars()
Expand All @@ -97,6 +99,27 @@ impl BenchMode {
};
APP_DIRS.cache_dir().join("llm-generated").join(name)
}

fn name(&self) -> &str {
match self {
BenchMode::Invariants => "Invariants",
BenchMode::Generic => "Generic",
BenchMode::Generate => "Generate",
BenchMode::Validating => "Validating",
BenchMode::StepByStep => "Step by step",
}
}

fn all() -> &'static [BenchMode] {
const MODES: &[BenchMode] = &[
BenchMode::Invariants,
BenchMode::Generic,
BenchMode::Generate,
BenchMode::Validating,
BenchMode::StepByStep,
];
MODES
}
}

impl Display for BenchMode {
Expand All @@ -106,6 +129,7 @@ impl Display for BenchMode {
BenchMode::Generic => write!(f, "generic"),
BenchMode::Generate => write!(f, "generate"),
BenchMode::Validating => write!(f, "validating"),
BenchMode::StepByStep => write!(f, "step-by-step"),
}
}
}
Expand Down
31 changes: 10 additions & 21 deletions gui/src/ui/settings.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,27 +37,16 @@ impl AppState {
self.settings.file_mode = FileMode::Directory;
}
ui.add_space(2.0);
if !self.settings.incremental_run {
ui.horizontal(|ui| {
ui.label("Bench mode: ");
ui.radio_value(
&mut self.settings.bench_type,
BenchMode::Invariants,
"Invariants",
);
ui.radio_value(&mut self.settings.bench_type, BenchMode::Generic, "Generic");
ui.radio_value(
&mut self.settings.bench_type,
BenchMode::Generate,
"Generate",
);
ui.radio_value(
&mut self.settings.bench_type,
BenchMode::Validating,
"Validating",
);
});
}
ui.horizontal(|ui| {
ui.label("Bench mode:");
egui::ComboBox::from_id_source("bench-mode-select")
.selected_text(format!("{}", self.settings.bench_type.name()))
.show_ui(ui, |ui| {
for mode in BenchMode::all() {
ui.selectable_value(&mut self.settings.bench_type, *mode, mode.name());
}
});
});

ui.separator();
ui.horizontal(|ui| {
Expand Down
4 changes: 4 additions & 0 deletions prompts/humaneval-dafny-cot/ask_for_fixed.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
The following errors occurred during verification:
{error}

Please fix the error by adding, removing or modifying the invariants and/or assertions and return the fixed program.
4 changes: 4 additions & 0 deletions prompts/humaneval-dafny-cot/ask_for_fixed_had_errors.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
There are still some errors:
{error}

Could you please fix them?
83 changes: 83 additions & 0 deletions prompts/humaneval-dafny-cot/rewrite/examples/001/answer.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
During the previous steps, we've found that the `sum_loop` function has a loop that goes over the elements of the array `numbers` and adds them to the `s` variable, which is equal to the `sum(numbers[..i])` for all `i`, such that `0 <= i <= |numbers|`. This means that during the execution of the loop the following invariant holds:
1. `0 <= i <= |numbers|`
2. `s == sum(numbers[..i])`

Let's put them in:
```dafny
method sum_loop(numbers: seq<int>) returns (s: int)
ensures s == sum(numbers)
{
s := 0;
var i := 0;
while (i < |numbers|)
invariant 0 <= i <= |numbers|
invariant s == sum(numbers[..i])
{
s := s + numbers[i];

i := i + 1;
}
}
```

We've also worked out how to prove the second invariant. For that we need to prove to the verifier that `sum(numbers[..i + 1]) == sum(numbers[..i]) + numbers[i]`. For that we need to use the `sum_prop` function, as follows:
```dafny
assert numbers[..i + 1][..i] == numbers[..i]; // assert that taking a slice of a slice is the same as taking a slice of the original array
sum_prop(numbers[..i + 1]); // use a lemma to prove the invariant
assert sum(numbers[..i + 1]) == sum(numbers[..i + 1][..i]) + numbers[i];
```
Let's add these lines to the code:
```dafny
method sum_loop(numbers: seq<int>) returns (s: int)
ensures s == sum(numbers)
{
s := 0;
var i := 0;
while (i < |numbers|)
invariant 0 <= i <= |numbers|
invariant s == sum(numbers[..i])
{
s := s + numbers[i];

assert numbers[..i + 1][..i] == numbers[..i]; // assert that taking a slice of a slice is the same as taking a slice of the original array
sum_prop(numbers[..i + 1]); // use a lemma to prove the invariant
assert sum(numbers[..i + 1]) == sum(numbers[..i + 1][..i]) + numbers[i];

i := i + 1;
}
}
```

Other than that, we didn't find any other invariants or patterns that hold during the execution of the program. Therefore, the resulting program is (notice the lack of backticks):
<answer>
function sum(s: seq<int>) : int {
if |s| == 0 then 0 else s[0] + sum(s[1..])
}

lemma sum_prop(s: seq<int>)
requires |s| > 0
ensures sum(s) == sum(s[..|s| - 1]) + s[ |s| - 1 ]
{
if (|s| > 1) {
assert (s[1..][..|s[1..]| - 1]) == s[1..|s| - 1];
}
}
method sum_loop(numbers: seq<int>) returns (s: int)
ensures s == sum(numbers)
{
s := 0;
var i := 0;
while (i < |numbers|)
invariant 0 <= i <= |numbers|
invariant s == sum(numbers[..i])
{
s := s + numbers[i];

assert numbers[..i + 1][..i] == numbers[..i]; // assert that taking a slice of a slice is the same as taking a slice of the original array
sum_prop(numbers[..i + 1]); // use a lemma to prove the invariant
assert sum(numbers[..i + 1]) == sum(numbers[..i + 1][..i]) + numbers[i];

i := i + 1;
}
}
</answer>
28 changes: 28 additions & 0 deletions prompts/humaneval-dafny-cot/rewrite/examples/001/question.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
It's time to rewrite the program to contain the invariants and assertions that we've found. Work on the following program and go step by step through the reasoning process. After that, output the rewritten program in the `<answer>` tag without backticks.

```dafny
function sum(s: seq<int>) : int {
if |s| == 0 then 0 else s[0] + sum(s[1..])
}

lemma sum_prop(s: seq<int>)
requires |s| > 0
ensures sum(s) == sum(s[..|s| - 1]) + s[ |s| - 1 ]
{
if (|s| > 1) {
assert (s[1..][..|s[1..]| - 1]) == s[1..|s| - 1];
}
}
method sum_loop(numbers: seq<int>) returns (s: int)
ensures s == sum(numbers)
{
s := 0;
var i := 0;
while (i < |numbers|)
{
s := s + numbers[i];

i := i + 1;
}
}
```
3 changes: 3 additions & 0 deletions prompts/humaneval-dafny-cot/rewrite/question.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
It's time to rewrite the program to contain the invariants and assertions that we've found. Work on the following program and go step by step through the reasoning process. After that, output the rewritten program in the `<answer>` tag without backticks.

{program}
10 changes: 10 additions & 0 deletions prompts/humaneval-dafny-cot/steps/001/examples/001/answer.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Let's go function by function. The first function, `sum` does not have a specification, as can be seen by the absence of either a `requires` or `ensures` clause. This means that this function is probably a specification itself. Let's consider, what it can be step by step. We can see that the aruguments to this function are a sequence of integers. We can see that the function returns an integer. We can conclude that the function does some accumulation of integers from the sequence. Looking at the body, we can see that the function takes the first element of the sequence and adds it to the accumulated result of evaluating the `sum` function on the rest of the sequence. This means the function is a specification for the sum of the elements of the sequence.

Next, we have the `sum_prop` lemma. Considering it's specification, we can see that it requires a non-empty sequence of integers, as seen by the `requires |s| > 0` clause. Looking at the ensures clause, we can see that the lemma ensures that the sum of the sequence (as computed by the `sum` function) is equal to the sum of the elements of the sequence without the last element, plus that last element. This is important to prove, because the sum function drops the first element and thus it is not obvious for the prover that the last element can be dropped in the same way.

Finally, we have the `sum_loop` method. This method has a specification, as indicated by the `ensures` clause. The specification states that the returned value `s` should be equal to the sum of the input sequence `numbers`, as computed by the `sum` function, as seen by the `ensures s == sum(numbers)` clause.

In summary:
1. The `sum` function is itself a specification for computing the sum of a sequence of integers.
2. The `sum_prop` lemma has a specification that proves a property about the `sum` function.
3. The `sum_loop` method has a specification that ensures it correctly computes the sum of a sequence of integers, matching the result of the `sum` function.
27 changes: 27 additions & 0 deletions prompts/humaneval-dafny-cot/steps/001/examples/001/question.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Given the following program, for each function, determine if it has a specification. If it does not, is it itself a specification? If it is a specification, what does it specify?
```dafny
function sum(s: seq<int>) : int {
if |s| == 0 then 0 else s[0] + sum(s[1..])
}

lemma sum_prop(s: seq<int>)
requires |s| > 0
ensures sum(s) == sum(s[..|s| - 1]) + s[ |s| - 1 ]
{
if (|s| > 1) {
assert (s[1..][..|s[1..]| - 1]) == s[1..|s| - 1];
}
}
method sum_loop(numbers: seq<int>) returns (s: int)
ensures s == sum(numbers)
{
s := 0;
var i := 0;
while (i < |numbers|)
{
s := s + numbers[i];

i := i + 1;
}
}
```
5 changes: 5 additions & 0 deletions prompts/humaneval-dafny-cot/steps/001/question.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Given the following program, for each function, determine if it has a specification. If it does not, is it itself a specification? If it is a specification, what does it specify?

```dafny
{program}
```
6 changes: 6 additions & 0 deletions prompts/humaneval-dafny-cot/steps/002/examples/001/answer.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Looking at the functions, we can see that the 'sum' function uses recursion and thus contains no loops. The 'sum_prop' lemma, in the same way, does not contain any loops, as evidenced by the absence of 'while' and 'for' keywords.
Now, let's examine the 'sum_loop' method. This method contains a while loop that iterates through the elements of the 'numbers' sequence. The loop starts with 's' and 'i' initialized to 0, and continues as long as 'i' is less than the length of 'numbers'. Let's try to understand what 's' and 'i' represent in this context.

Looking at the body of the loop, we can see that it adds the current element of 'numbers' to 's'. This means that 's' is the running sum of the elements in 'numbers', namely 's = sum(numbers[..i])'. 'i' is easier to analyze. It is the index of the current element being processed. In each iteration of the loop, 'i' is incremented by 1, so 'i' represents the index of the next element to be processed.

Each iteration of the loop adds the current element to 's', and then increments 'i' to move to the next element. This process continues until 'i' reaches the length of 'numbers', at which point the loop terminates and 's' holds the final sum of the elements in 'numbers'.
28 changes: 28 additions & 0 deletions prompts/humaneval-dafny-cot/steps/002/examples/001/question.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Now we are finished with the specifications. Let's move onto the loops.
Given the following program, for each function, determine if it has loops. If it does, try to determine what the loop does in detail. Go through all your thinking and reasoning process step by step.
```dafny
function sum(s: seq<int>) : int {
if |s| == 0 then 0 else s[0] + sum(s[1..])
}

lemma sum_prop(s: seq<int>)
requires |s| > 0
ensures sum(s) == sum(s[..|s| - 1]) + s[ |s| - 1 ]
{
if (|s| > 1) {
assert (s[1..][..|s[1..]| - 1]) == s[1..|s| - 1];
}
}
method sum_loop(numbers: seq<int>) returns (s: int)
ensures s == sum(numbers)
{
s := 0;
var i := 0;
while (i < |numbers|)
{
s := s + numbers[i];

i := i + 1;
}
}
```
4 changes: 4 additions & 0 deletions prompts/humaneval-dafny-cot/steps/002/question.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Given the following program, for each function, determine if it has loops. If it does, try to determine what the loop does in detail. Go through all your thinking and reasoning process step by step.
```dafny
{program}
```
15 changes: 15 additions & 0 deletions prompts/humaneval-dafny-cot/steps/003/examples/001/answer.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
As we've seen, the loop in the `sum_loop` function goes over the elements of the array `numbers` and adds them to the `s` variable, which is equal to the `sum(numbers[..i])` for all `i`, such that `0 <= i <= |numbers|`. This means that during the execution of the loop the following invariant holds:
1. `0 <= i <= |numbers|`
2. `s == sum(numbers[..i])`

The second invariant might be a bit difficult to prove, as it adds the last element of the array to the sum, while, as I've said before, the `sum` function is defined as `sum(a) == a[0] + sum(a[1..])`. However, as we've seen, it's possible to use the `sum_prop` function to prove that this invariant holds.
Let's work out how to use it. If applied to the `numbers[..i + 1]`, we get:
`sum(numbers[..i + 1]) == sum(numbers[..i + 1][..i]) + numbers[i]`. It might be difficult for dafny to prove that `sum(numbers[..i + 1][..i]) == sum(numbers[..i])`, so I'd suggest adding an assertion before calling `sum_prop` to make sure that this is the case, namely `assert numbers[..i + 1][..i] == numbers[..i]`.
So, to use the `sum_prop`, I'd suggest the following code:
```
assert numbers[..i + 1][..i] == numbers[..i]; // assert that taking a slice of a slice is the same as taking a slice of the original array
sum_prop(numbers[..i + 1]); // use a lemma to prove the invariant
assert sum(numbers[..i + 1]) == sum(numbers[..i + 1][..i]) + numbers[i];
```

In conclusion, we can say that two invariants hold and the second one could be proved using the `sum_prop` function.
27 changes: 27 additions & 0 deletions prompts/humaneval-dafny-cot/steps/003/examples/001/question.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
As we've already analyzed the specifications and the loops, it's time to try to analyze what invariants and patterns hold during the execution of the program. Go over the found loops in the following program in the code and try to find out what invariants and patterns they hold. Go through all your thinking and reasoning process step by step.
```dafny
function sum(s: seq<int>) : int {
if |s| == 0 then 0 else s[0] + sum(s[1..])
}

lemma sum_prop(s: seq<int>)
requires |s| > 0
ensures sum(s) == sum(s[..|s| - 1]) + s[ |s| - 1 ]
{
if (|s| > 1) {
assert (s[1..][..|s[1..]| - 1]) == s[1..|s| - 1];
}
}
method sum_loop(numbers: seq<int>) returns (s: int)
ensures s == sum(numbers)
{
s := 0;
var i := 0;
while (i < |numbers|)
{
s := s + numbers[i];

i := i + 1;
}
}
```
4 changes: 4 additions & 0 deletions prompts/humaneval-dafny-cot/steps/003/question.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
As we've already analyzed the specifications and the loops, it's time to try to analyze what invariants and patterns hold during the execution of the program. Go over the found loops in the following program in the code and try to find out what invariants and patterns they hold. Go through all your thinking and reasoning process step by step.
```dafny
{program}
```
Loading

0 comments on commit f266fb3

Please sign in to comment.