Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new attempt at clean PSB2 with five base problems #60

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,9 @@ Manifest.toml

# Ignore large data files
*_large.jl

# Ignore the CondaPkg files used by PSB2 for retrieving the problem files
.CondaPkg/

# Don't commit the datasets file when extracted from the PSB2 online benchmark
**/PSB2_2021/datasets/
2 changes: 2 additions & 0 deletions CondaPkg.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[pip.deps]
psb2 = ""
12 changes: 12 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,32 @@ authors = ["jaapjong <[email protected]>", "Tilman Hinnerichs <t.r.
version = "0.2.2"

[deps]
Conda = "8f4d0f93-b110-5947-807f-2305c1781a2d"
CondaPkg = "992eb4ea-22a4-4c89-a5bb-47a3300528ab"
FilePathsBase = "48062228-2e41-5def-b9a4-89aafe57970f"
HerbCore = "2b23ba43-8213-43cb-b5ea-38c12b45bd45"
HerbGrammar = "4ef9e186-2fe5-4b24-8de7-9f7291f24af7"
HerbInterpret = "5bbddadd-02c5-4713-84b8-97364418cca7"
HerbSearch = "3008d8e8-f9aa-438a-92ed-26e9c7b4829f"
HerbSpecification = "6d54aada-062f-46d8-85cf-a1ceaf058a06"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
MLStyle = "d8e11817-5142-5d16-987a-aa16d5891078"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
PythonCall = "6099a3de-0909-46bc-b1f4-468b9a2dfc0d"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Revise = "295af30f-e4ad-537b-8983-00126c2a3abe"
SExpressions = "eaa8e424-c5f6-11e8-1b3d-d576ba0eee97"

[compat]
Conda = "1.10.2"
CondaPkg = "0.2.24"
HerbCore = "^0.3.0"
HerbGrammar = "^0.4.0"
HerbInterpret = "0.1.3"
HerbSearch = "0.3.0"
HerbSpecification = "^0.1.0"
PyCall = "1.96.4"
PythonCall = "0.9.23"
julia = "^1.8"

[extras]
Expand Down
36 changes: 36 additions & 0 deletions src/data/PSB2_2021/PSB2_2021.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
module PSB2_2021

using JSON
using HerbSpecification

include("data.jl")
include("retrieve_all_tasks.jl")
include("grammar.jl")
include("program_examples.jl")

export
parse_line_json
write_psb2_problems_to_file

"""
parse_line_json(line::AbstractString)::IOExample

Parses a line from a file in the `strings` dataset
"""
function parse_line_json(line::AbstractString)::IOExample
js = JSON.parse(line)
inputs = Dict{Symbol, Any}()
outputs = Dict{Symbol, Any}()
for (k, v) in js
if occursin("output", k)
outputs[Symbol(k)] = v
elseif occursin("input", k)
inputs[Symbol(k)] = v
else
throw(KeyError("Unknown type of JSON key: no input or output"))
end
end
return IOExample(inputs, outputs)
end

end # module PSB2_2021
94 changes: 94 additions & 0 deletions src/data/PSB2_2021/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# PSB2: The Second Program Synthesis Benchmark Suite

## Usage
The benchmark consists of a set of 25 problems, for each problem:
- `data.jl` contains an example problem with IOExamples for the edge cases (defined by the original benchmark), for example, `problem_basement`
- `grammar.jl` contains an instantiated grammar object, as well as a function to get a new grammar instantiation, for example, `grammar_basement`
- These grammars contain the full set of relevant instructions
- Alternatively, the function can be used to get a minimal grammar, which contains the really necessary instructions, for example, `grammar_basement(minimal=true)`
- `program_examples.jl` gives an example program in the expected format for the given grammars

Bigger problems can also be created by retrieving bigger problems in a file using the
```
write_psb2_problems_to_file(
problems::Vector{String}=String["fizz-buzz"],
edge_or_random::String="random",
n_train::Int64=200,
n_test::Int64=2000,
format::String="psb2")
```

As an example, see `example_using_the_benchmark.jl`.

## Dataset
This dataset comprises 25 different problems defined in PSB2. The following table (from [the paper](https://dl.acm.org/doi/abs/10.1145/3449639.3459285?casa_token=biEgaE8LwGkAAAAA%3AyObtJCr1MPh3ObTIh6RQUFP7Sx2E4isZAOpTHNWLkJuCcmOPRGnR94xTCddGkTJLwEbx_LpKfFv8)) describes the problems.

<img width="836" alt="afbeelding" src="https://github.com/Herb-AI/HerbBenchmarks.jl/assets/5456207/590487a8-10da-46b0-ad69-212d1c49a39c">

## Instruction Set
The instruction set used in the benchmark paper is PushGP, "a stack-based programming language built specifically for use in genetic programming". More information can be found in a paper on [Push3](https://dl.acm.org/doi/10.1145/1068009.1068292) and [PushGP](https://link.springer.com/article/10.1023/A:1014538503543). This directory uses Julia native functions without the stack-based implementation.

The table below shows the different input sets used for each problem in the benchmark, the sets of grammars that are available, as well as the constants.

![image](https://github.com/Herb-AI/HerbBenchmarks.jl/assets/23522361/2f7aac44-833f-4acd-b052-30bbb93bf561)


For more information, see:
> T. Helmuth and P. Kelly, “PSB2: The Second Program Synthesis Benchmark Suite”. Zenodo, Apr. 10, 2021. doi: 10.5281/zenodo.5084812.


## Structure of benchmark folder
The 25 problems of the benchmark are added iteratively. The `data.jl` does already contain examples of all problems, but the rest only for the already implemented ones, see below.

We do not keep all data in this repository. The `retrieve_all_tasks.jl` functionality can be used to retrieve more tasks for a problem example. The `data.jl` file keeps a small list of IOExamples per problem.

- `grammar.jl` This file holds the actual grammars for each of the benchmark problems based on functions. These are combinations of specific input grammars per problem (from `problem_grammars.jl`) and default grammar for different data types (from `base_grammars.jl`). These are merged together to form the `grammar_{problem_name}` object using `merge_grammar()`.
- When an ephemeral random constant (ERC) is used, we interpret this as adding one character noise to the grammar (the range is defined per problem). This is added within the function so it is not evaluated within the grammar, therefore there are no grammar objects, only functions
- Each input grammar should have a `Return` symbol as the return type returning a dict with the correct number of outputs
- For each problem, there is an instantiation of the grammar as an object that can be retrieved when using the benchmarks. However, the functions can also be called (especially when you want to see the effect of having a different noise character).
- `problem_grammars.jl` This file constructs the specific grammars defining the constants, inputs, and outputs.
- `base_grammars.jl` This file includes the base grammars with general functions for integers, strings, characters, Booleans, lists, and execution statements.
- `psb2_primitives.jl` This file includes extra grammar functions, like functions used in the base grammars (like a custom `command_while()` that has a limit) and `merge_grammar`.
- `data.jl` This file holds some example problems with a small set of IOExamples for each problem in the benchmark.
- `program_examples.jl` This file shows some example programs for each problem: possible outputs of the synthesis.
- `retrieve_all_tasks.jl` This file shows the functionality for retrieving larger problems from the benchmark, which can be downloaded and written to a JSON file. We distinguish a problem (defined by a set of IOExamples) which can have many tasks (each IOExample).
- `example_using_the_benchmark.jl` This file shows an example how to use the benchmarks: where to find the grammar and the data.

### Adding a PSB2 benchmark problem

To add another problem you have to define the following:
- The data format is already specified in `data.jl` which you can use to structure the rest.
- The program example should be added in the `program_examples.jl`, where you can check that the program is possible to make using the grammar you defined for the problem. Use the naming convention `program_{problem_name}`.
- Also write a test for this program
- The grammar needs to be defined in the `grammar.jl`, which should be a function. The function takes in a minimal boolean to allow the inclusion of an ERC in the minimal grammar as well as the full one. The full grammar is a merge between functions of the `base_grammars.jl` and an `input_{problem_name}` defined in `problem_grammars.jl` defining the constants as defined in the Instruction Set table.
- If it specifies an ephemeral random constant (ERC) is used, we interpret this as adding one character of noise to the grammar. This is done by adding a specific rule with the random character.
- Use the naming convention `grammar_{problem_name}` for the output of the `merge_grammar` of the sub-grammars from `grammars.jl` with the new input grammar.
- Define a `minimal_grammar_{problem_name}` in `problem_grammars.jl` containing all the functions used by the program from `program_{problem_name}` as a small test case.
- Check that all required functionality is implemented in the `base_grammars.jl`, considering the list or state changes for different input types (like `String`).

## Implemented PSB2 problems
- [x] Basement
- [ ] Bouncing Balls
- [ ] Bowling
- [ ] Camel Case
- [x] Coinsums
- [ ] Cut Vector
- [ ] Dice Game
- [ ] Find Pair
- [x] Fizzbuzz
- [x] Fuelcost
- [x] GCD
- [ ] Indices
- [ ] Leaders
- [ ] Luhn
- [ ] Mastermind
- [ ] Middle Character
- [ ] Paired Digits
- [ ] Shopping List
- [ ] Snow Day
- [ ] Solve Boolean
- [ ] Spin Words
- [ ] Square Digits
- [ ] Substitution Cipher
- [ ] Twitter
- [ ] Vector Distance
146 changes: 146 additions & 0 deletions src/data/PSB2_2021/base_grammars.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
include("psb2_primitives.jl")

grammar_integer = @csgrammar begin
IntRule = IntRule + IntRule
IntRule = IntRule - IntRule
IntRule = IntRule * IntRule
IntRule = IntRule / IntRule
IntRule = IntRule % IntRule
IntRule = IntRule ^ IntRule
IntRule = IntRule ^ 2
IntRule = IntRule + 1
IntRule = IntRule - 1
IntRule = IntRule * -1
IntRule = max(IntRule, IntRule)
IntRule = min(IntRule, IntRule)
IntRule = abs(IntRule)
IntRule = Integer(Boolean)
IntRule = Integer(String)
IntRule = Integer(Float)
IntRule = Integer(Character)
IntRule = ceil(IntRule)
IntRule = floor(IntRule)
Boolean = IntRule > IntRule
Boolean = IntRule >= IntRule
Boolean = IntRule < IntRule
Boolean = IntRule <= IntRule
Boolean = IntRule == IntRule
Boolean = IntRule != IntRule
IntRule = Boolean ? IntRule : IntRule
Expression = IntRule
Expression = begin Expression; Expression end
IntRule = command_while(Boolean, Expression)
end

grammar_state_integer = @csgrammar begin
State = Dict(Sym => IntRule)
State = Dict(Sym => IntRule, Sym => IntRule)
State = Dict(Sym => IntRule, Sym => IntRule, Sym => IntRule)
State = merge!(state, State)
State = push!(State, Sym => IntRule)
IntRule = get(state, Sym, "Key not found")
Expression = State | IntRule
Expression = let state = State; Expression end
end

grammar_list_integer = @csgrammar begin
List = map(Func, List)
Func = (x -> IntRule)
IntRule = x
IntRule = length(List)
IntRule = sum(List)
IntRule = indexin(IntRule, List)
List = getindex(List, IntRule:IntRule)
end

grammar_float = @csgrammar begin
Number = Float | IntRule
Number = Number + Number
Number = Number - Number
Number = Number * Number
Number = Number / Number
Number = Number % Number
Number = Number ^ Number
Number = Number ^ 2
Float = sqrt(Number)
Number = Number + 1
Number = Number - 1
Number = Number * -1
Number = min(Number, Number)
Number = max(Number, Number)
Number = abs(Number)
Number = ceil(Number)
Number = floor(Number)
Float = cos(Number)
Float = sin(Number)
Float = tan(Number)
Float = argcos(Number)
Float = argsin(Number)
Float = argtan(Number)
Float = log(Number, IntRule)
Float = log(Number, 2)
Float = log(Number, 10)
Float = float(IntRule)
Float = float(String)
Float = float(Boolean)
Expression = Float
Expression = begin Expression; Expression end
Float = Boolean ? Expression : Expression
Float = command_while(Boolean, Expression)
end

grammar_boolean = @csgrammar begin
Boolean = Boolean && Boolean
Boolean = Boolean || Boolean
Boolean = !Boolean
Boolean = Boolean ⊻ Boolean
Boolean = Bool(IntRule)
Boolean = Bool(Float)
Boolean = Boolean ? Boolean : Boolean
end

grammar_character = @csgrammar begin
Character = Char(IntRule)
Character = Char(Float)
Character = Char(Boolean)
Character = Char(String)
Boolean = islowercase(Character)
Boolean = isuppercase(Character)
Boolean = isletter(Character)
Boolean = isdigit(Character)
Boolean = iswhitespace(Character)
Character = Boolean ? Character : Character
end

grammar_string = @csgrammar begin
String = string(Character)
String = string(IntRule)
String = string(Boolean)
String = string(Float)
String = String * String
String = String * string(Character)
String = String[IntRule]
String = chop(String, head=IntRule, last=IntRule)
String = chop(String, head=0, last=1)
String = first(String)
String = last(String)
String = reverse(String)
String = replace(String, String=>String)
String = replace(String, Character=>Character)
String = replace(String, String=>String, count=1)
String = replace(String, Character=>Character, count=1)
String = replace(String, Character=>"")
String = uppercase(String)
String = lowercase(String)
String = replace_in_string(String, IntRule, Character)
String = ""
IntRule = length(String)
IntRule = findfirst(Character, String)
IntRule = count(String, String)
Boolean = contains(Character, String)
Boolean = contains(String, String)
String = Boolean ? String : String
Expression = String
Expression = begin Expression; Expression end
String = command_while(Boolean, Expression)
end
16 changes: 16 additions & 0 deletions src/data/PSB2_2021/citation.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
@InProceedings{Helmuth:2021:GECCO,
author = "Thomas Helmuth and Peter Kelly",
title = "{PSB2}: The Second Program Synthesis Benchmark Suite",
booktitle = "2021 Genetic and Evolutionary Computation Conference",
series = {GECCO '21},
year = "2021",
isbn13 = {978-1-4503-8350-9},
address = {Lille, France},
size = {10 pages},
doi = {10.1145/3449639.3459285},
publisher = {ACM},
publisher_address = {New York, NY, USA},
month = {10-14} # jul,
doi-url = {https://doi.org/10.1145/3449639.3459285},
URL = {https://dl.acm.org/doi/10.1145/3449639.3459285},
}
Loading
Loading