Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frangel v2 #111

Open
wants to merge 122 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
dee1065
Add FrAngel boilerplate
To5BG May 4, 2024
0121ef6
Add fragment utility functions
To5BG May 4, 2024
4b2dcef
Add helper size functions
GeorgeLatsev May 4, 2024
b72774e
Add simplifyQuick progress
GeorgeLatsev May 4, 2024
6cf4601
Update function signatures to snake case
To5BG May 4, 2024
e1aa0c9
Merge remote-tracking branch 'origin/frangel' into frangel
GeorgeLatsev May 4, 2024
acdb890
Add random program generation
GeorgeLatsev May 4, 2024
09ca242
Add tests for mineFragments and rememberPrograms
To5BG May 4, 2024
606f517
Refactor some util functions
To5BG May 5, 2024
762810e
Add FrAngel iterator
GeorgeLatsev May 5, 2024
07c0747
Merge remote-tracking branch 'origin/frangel' into frangel
GeorgeLatsev May 5, 2024
0fea621
Fix FrAngel algorithm to store updated fragments in state
To5BG May 5, 2024
77fe324
Implement resolve_angelic
To5BG May 5, 2024
b553cba
Split util functions into separate files
To5BG May 5, 2024
4d37c72
Update variable naming
To5BG May 5, 2024
17fcaf4
Turn `generate_with_angelic` to float for more configurability
To5BG May 5, 2024
8af4f67
Fix basic FrAngel test
To5BG May 5, 2024
846e9b1
Fix simplify_quick
GeorgeLatsev May 5, 2024
4f9c64d
Update `get_passed_tests` and `resolve_angelic!` to have a parameter …
To5BG May 5, 2024
83a2532
Add angelic conditions to program generation
GeorgeLatsev May 5, 2024
6a3a340
Implement angelic evaluation
To5BG May 6, 2024
845bad7
Add angelic sigma for excessive failure cases
To5BG May 6, 2024
509a238
Refactor `update_path` to use char-arrays
To5BG May 6, 2024
3a354af
Write basic tests for angelic execution
To5BG May 6, 2024
8d1af5b
Do slight optimizations and segregate configs
To5BG May 11, 2024
0e0b5c9
Update documentation of FrAngel functions
To5BG May 11, 2024
6b33d1e
Small fixes
GeorgeLatsev May 11, 2024
fef20d0
Restructure iterator
GeorgeLatsev May 14, 2024
a2fafac
Update grammar rule probabilities to use fragments
GeorgeLatsev May 14, 2024
76aa600
Add function to replace fragments
GeorgeLatsev May 14, 2024
f871138
Add test for add_fragments_prob! and small fixes
GeorgeLatsev May 16, 2024
d1dd377
Add tests for modify_and_replace_program_fragments! and small fixes
GeorgeLatsev May 16, 2024
431f75b
Add angelic conditions to generated trees
GeorgeLatsev May 16, 2024
c05257d
Add simplify_slow placeholder
GeorgeLatsev May 16, 2024
91b362e
Fix test parsing error (unfinished block)
To5BG May 17, 2024
28e5c64
Refactor file structure of frangel
To5BG May 18, 2024
4858721
Add fragments to grammar
To5BG May 18, 2024
0c5eae6
Refactor test structure
To5BG May 18, 2024
697d226
Update test suites and fix implementation along
To5BG May 18, 2024
0d19937
Handle StateHole terminal nodes
GeorgeLatsev May 18, 2024
e3d88f1
Change use_fragments_chance to Float64 and small changes
GeorgeLatsev May 18, 2024
73a5a44
Fix iterator behavior by copying generated programs
GeorgeLatsev May 18, 2024
07e0b38
Progress
GeorgeLatsev May 20, 2024
341ba38
Small fixes
GeorgeLatsev May 20, 2024
88aea94
Small fixes
GeorgeLatsev May 20, 2024
d28aad1
Implement bulk add_rule util function
To5BG May 20, 2024
f7abd33
Optimize weights calculation
GeorgeLatsev May 20, 2024
0429b77
Move random iterator into separate file
GeorgeLatsev May 20, 2024
c29ae13
Replace one-by-one rule removal to base grammar reset
To5BG May 20, 2024
b5482fb
Move grammar update logic outside of remember_program!
To5BG May 20, 2024
ad4b705
Optimize weighted choosing in iterator
GeorgeLatsev May 20, 2024
4398764
Further improvements
GeorgeLatsev May 20, 2024
83aac80
Remember tried programs
GeorgeLatsev May 21, 2024
ebf1eb8
Use smaller numbers
GeorgeLatsev May 21, 2024
1bbdbb9
Fix integer types and constants
GeorgeLatsev May 21, 2024
67d6623
Reuse minsizes
GeorgeLatsev May 21, 2024
2acbace
Refactor example to run
To5BG May 21, 2024
ff9a40f
Change logic for determining fragment rule
To5BG May 21, 2024
e8f8db3
Slight refactoring
To5BG May 21, 2024
10b98ce
Refactor fragments in grammar and optimize iterator
GeorgeLatsev May 22, 2024
caec159
Further iterator optimizations
GeorgeLatsev May 22, 2024
ed88f0e
Fix fragments not being used
GeorgeLatsev May 22, 2024
d770163
Add getRange test case
GeorgeLatsev May 22, 2024
fcb1ae1
Add debugging/verbose functionalities
To5BG May 22, 2024
f9f9ce3
Restructure fragment and grammar-related utilities
To5BG May 23, 2024
41c4c77
Add an end-to-end test
To5BG May 23, 2024
593d5db
Update angelic conditions to use code_paths from interpreter
To5BG May 24, 2024
bde9070
Add angelic tests
To5BG May 24, 2024
29079f7
Update some of the test suites
To5BG May 24, 2024
2a26863
Force julia to create new grammar for each test run
GeorgeLatsev May 24, 2024
e8d0406
Update HerbGrammar dependency
GeorgeLatsev May 24, 2024
e1a7523
Adjust grammar used for utils tests
GeorgeLatsev May 24, 2024
81ea9ac
Update angelic condition hole domains and some test suites
GeorgeLatsev May 24, 2024
d4693cb
Rewrite generator tests and extract setup
To5BG May 25, 2024
62fe1bb
Refactor frangel files by updating formatting
To5BG May 25, 2024
e942f8e
Fix misc bugs
To5BG May 25, 2024
227c707
Change angelic conditions representation
GeorgeLatsev May 25, 2024
cfd6c41
Fix angel_execution test
To5BG May 25, 2024
e2bd4dc
Document all utility functions
To5BG May 26, 2024
81de7e8
Fix angelic condition generation and docs
GeorgeLatsev May 26, 2024
4018784
Add test for code path generation
To5BG May 26, 2024
780d2e1
Miscellaneous bug fixes
To5BG May 26, 2024
47a6d7f
Fix angelic execution on multiple tests
To5BG May 26, 2024
66dfa7b
Update one end-to-end test to use angelic conditions
To5BG May 26, 2024
244580e
Remove unnecessary deepcopy
GeorgeLatsev May 26, 2024
dd6fd7d
Add deepcopy for updated by angelic conditions fragments
GeorgeLatsev May 26, 2024
039ebe2
Keep visited programs in LongHashMap
GeorgeLatsev May 26, 2024
a0ec223
Extract some methods from `frangel`
To5BG May 26, 2024
b48d3e1
Replace angelic boolean expr set to custom hashmap
To5BG May 26, 2024
11df284
Update default capacity of custom hashmap
To5BG May 26, 2024
95bb182
Keep track of fragment changes
GeorgeLatsev May 26, 2024
5ecfaf4
Use a custom bit trie
To5BG May 27, 2024
3148ca0
Revert "Keep track of fragment changes"
To5BG May 28, 2024
e330785
Update to use CodePath over modifying BitVector
To5BG May 28, 2024
54937c6
Merge remote-tracking branch `dev` into frangel
To5BG May 28, 2024
3fbb36b
Fix some angelic bugs
To5BG May 29, 2024
16b9ed8
Update "generic" grammar
To5BG May 29, 2024
3c7dfa3
Update config to store rulenode over index
To5BG May 29, 2024
3ef85c5
Fix angelic test
To5BG May 29, 2024
76cde59
Update rule minsize function
GeorgeLatsev May 30, 2024
bc9ceb0
Replace queues with in-place modification for angelic resolution
To5BG Jun 1, 2024
6d80e04
Change indentation and remove unused functions
To5BG Jun 1, 2024
01c30c6
Comment the FrAngel codebase
To5BG Jun 6, 2024
5b9e774
Add a doc for the main `frangel` function
To5BG Jun 6, 2024
4210bab
Misc changes to tests and docs
To5BG Jun 7, 2024
ebee907
Add docs for the generation and the random iterator
GeorgeLatsev Jun 7, 2024
81197c7
Fix doc correctness of frangel
To5BG Jun 8, 2024
d050fab
Init testing for frangel codebase refactor
To5BG Jul 5, 2024
7a445e9
Move fragments and angelic conditions into separate file structures
To5BG Jul 5, 2024
b5c87bc
Extend fragment utilities
To5BG Jul 5, 2024
afefdc4
Replace count_node with built-in counter
To5BG Jul 5, 2024
364c8c8
Restructure code for angelic conditions
To5BG Jul 5, 2024
ba3418e
Move angelic execution to HerbInterpret
To5BG Jul 8, 2024
bd3c77f
Remove custom random RuleNode generate to use built-in one instead
To5BG Jul 8, 2024
b2c681d
Minor doc changes
To5BG Jul 8, 2024
e026bd8
Move LongHashMap from FrAngel utils to angelic conditions
To5BG Jul 8, 2024
22ecfa3
Get rid of final FrAngel dependencies and fragments
To5BG Jul 8, 2024
cb05f0b
Get rid of verbose printing counters
To5BG Jul 8, 2024
6e29752
Small fix on replacement generation limit
To5BG Jul 8, 2024
4909e61
Add the last end_to_end test
To5BG Jul 8, 2024
5cf5070
Do not consider terminals as fragments
To5BG Jul 8, 2024
35c1057
Merge branch 'dev' into frangel_v2
ReubenJ Jul 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 50 additions & 2 deletions src/HerbSearch.jl
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,19 @@ include("genetic_search_iterator.jl")

include("random_iterator.jl")

include("fragments/fragment_grammar_utils.jl")
include("fragments/mining_fragments.jl")

include("angelic_conditions/generate_angelic.jl")
include("angelic_conditions/angelic_replacement_strategies.jl")
include("angelic_conditions/long_hash_map.jl")

include("frangel/frangel.jl")
include("frangel/frangel_utils.jl")
include("frangel/frangel_generation.jl")
include("frangel/frangel_random_iterator.jl")


export
ProgramIterator,
@programiterator,
Expand Down Expand Up @@ -69,13 +82,48 @@ export
MHSearchIterator,
VLSNSearchIterator,
SASearchIterator,

mean_squared_error,
misclassification,

GeneticSearchIterator,
misclassification,
validate_iterator,
sample,
rand
rand,

FrAngelConfig,
FrAngelConfigGeneration,
frangel,

replace_first_angelic!,
replace_last_angelic!,

generate_random_program,
modify_and_replace_program_fragments!,
add_angelic_conditions!,
resolve_angelic!,

mine_fragments,
remember_programs!,

add_fragments_prob!,
setup_grammar_with_fragments!,
add_fragment_base_rules!,
add_fragment_rules!,
updateGrammarWithFragments!,

FrAngelRandomIterator,
FrAngelRandomIteratorState,

simplify_quick,
_simplify_quick_once,
symbols_minsize,
rules_minsize,

LongHashMap,
init_long_hash_map,
lhm_put!,
lhm_contains

end # module HerbSearch
83 changes: 83 additions & 0 deletions src/angelic_conditions/angelic_replacement_strategies.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
"""
replace_first_angelic!(program::RuleNode, boolean_expr::RuleNode, angelic_rulenode::RuleNode, angelic_conditions::Dict{UInt16,UInt8})
::Union{Tuple{RuleNode,Int,AbstractHole},Nothing}

Replaces the first `AbstractHole` node in the `program` with the `boolean_expr` node.
The 'first' is defined here as the first node visited by pre-order traversal, left-to-right. The program is modified in-place.

# Arguments
- `program`: The program to resolve angelic conditions in.
- `boolean_expr`: The boolean expression node to replace the `AbstractHole` node with.
- `angelic_rulenode`: The angelic rulenode. Used to compare against nodes in the program.
- `angelic_conditions`: A dictionary mapping indices of angelic condition candidates, to the child index that may be changed.

# Returns
The parent node, the index of its modified child, and the modification. Used to clear the changes if replacement is unsuccessful.

"""
function replace_first_angelic!(
program::RuleNode,
boolean_expr::RuleNode,
angelic_rulenode::RuleNode,
angelic_conditions::Dict{UInt16,UInt8}
)::Union{Tuple{RuleNode,Int,AbstractHole},Nothing}
angelic_index = get(angelic_conditions, program.ind, -1)
for (child_index, child) in enumerate(program.children)
if child_index == angelic_index && child isa AbstractHole
program.children[child_index] = boolean_expr
return (program, child_index, child)
else
res = replace_first_angelic!(child, boolean_expr, angelic_rulenode, angelic_conditions)
if res !== nothing
return res
end
end
end
return nothing
end


"""
replace_last_angelic!(program::RuleNode, boolean_expr::RuleNode, angelic_rulenode::RuleNode, angelic_conditions::Dict{UInt16,UInt8})
::Union{Tuple{RuleNode,Int,AbstractHole},Nothing}

Replaces the last `AbstractHole` node in the `program` with the `boolean_expr` node.
The 'last' is defined here as the first node visited by reversed pre-order traversal (right-to-left). The program is modified in-place.

# Arguments
- `program`: The program to resolve angelic conditions in.
- `boolean_expr`: The boolean expression node to replace the `AbstractHole` node with.
- `angelic_rulenode`: The angelic rulenode. Used to compare against nodes in the program.
- `angelic_conditions`: A dictionary mapping indices of angelic condition candidates, to the child index that may be changed.

# Returns
The parent node, the index of its modified child, and the modification. Used to clear the changes if replacement is unsuccessful.

"""
function replace_last_angelic!(
program::RuleNode,
boolean_expr::RuleNode,
angelic_rulenode::RuleNode,
angelic_conditions::Dict{UInt16,UInt8}
)::Union{Tuple{RuleNode,Int,AbstractHole},Nothing}
angelic_index = get(angelic_conditions, program.ind, -1)
# Store indices to go over them backwards later
indices = Vector{Int}([])
for child_index in reverse(eachindex(program.children))
child = program.children[child_index]
if child_index == angelic_index && child isa AbstractHole
program.children[child_index] = boolean_expr
return (program, child_index, child)
else
push!(indices, child_index)
end
end
# If no angelic in this layer, continue onto lower layers, in reverse
for child_index in indices
res = replace_last_angelic!(program.children[child_index], boolean_expr, angelic_rulenode, angelic_conditions)
if res !== nothing
return res
end
end
return nothing
end
113 changes: 113 additions & 0 deletions src/angelic_conditions/generate_angelic.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
"""
resolve_angelic!(
program::RuleNode, passing_tests::BitVector, grammar::AbstractGrammar, symboltable::SymbolTable, tests::AbstractVector{<:IOExample},
replacement_func::Function, angelic_conditions::Dict{UInt16,UInt8}, angelic_config::ConfigAngelic, evaluation_grammar::AbstractGrammar)::RuleNode

Resolve angelic conditions in the given program by generating random boolean expressions and replacing the holes in the expression.
The program is modified in-place. All replacement strategies are attempted sequentially as provided.

# Arguments
- `program`: The program to resolve angelic conditions in.
- `passing_tests`: A BitVector representing the tests that the program has already passed.
- `grammar`: The grammar rules of the program to be used for sampling angelic condition candidates.
- `symboltable`: A symbol table for the grammar.
- `tests`: A vector of `IOExample` objects representing the input-output test cases.
- `replacement_func`: The function to use for replacement -> either `replace_first_angelic!` or `replace_last_angelic!`.
- `angelic_conditions`: A dictionary mapping indices of angelic condition candidates, to the child index that may be changed.
- `angelic_config`: The configuration for angelic conditions.
- `evaluation_grammar`: The grammar rules of the program to be used for evaluation. Usually the same as `grammar`, or augmented with fragments.

# Returns
The resolved program with angelic values replaced, or an unresolved program if it times out.

"""
function resolve_angelic!(
program::RuleNode,
passing_tests::BitVector,
grammar::AbstractGrammar,
symboltable::SymbolTable,
tests::AbstractVector{<:IOExample},
replacement_funcs::Vector{Function},
angelic_conditions::Dict{UInt16,UInt8},
angelic_config::ConfigAngelic,
evaluation_grammar::AbstractGrammar=grammar
)::RuleNode
num_holes = number_of_holes(program)
# Try each replacement strategy
for replacement_strategy in replacement_funcs
new_tests = BitVector([false for _ in tests])
# Continue resolution until all holes are filled
while num_holes != 0
success = false
start_time = time()
# Keep track of visited replacements - avoid duplicates
visited = init_long_hash_map()
while time() - start_time < angelic_config.max_time
# Generate a replacement
boolean_expr = rand(RuleNode, grammar, :Bool, angelic_config.boolean_expr_max_depth)
program_hash = hash(boolean_expr)
if lhm_contains(visited, program_hash)
continue
end
lhm_put!(visited, program_hash)
# Either replace 'first' or 'last' hole
changed = replacement_strategy(program, boolean_expr, angelic_config.angelic_rulenode, angelic_conditions)
update_passed_tests!(program, evaluation_grammar, symboltable, tests, new_tests, angelic_conditions, angelic_config)
# If the new program passes all the tests the original program did, replacement is successful
if all(passing_tests .== (passing_tests .& new_tests))
passing_tests = new_tests
success = true
break
else
# Undo replacement changes
changed[1].children[changed[2]] = changed[3]
end
end
if success
num_holes -= 1
else
break
end
end
end
return program
end

"""
add_angelic_conditions!(program::RuleNode, grammar::AbstractGrammar, angelic_conditions::Dict{UInt16,UInt8})::RuleNode

Add angelic conditions to a program. This is done by replacing some of the nodes indicated by `angelic_conditions`` with holes.

# Arguments
- `program`: The program to modify.
- `grammar`: The grammar rules of the program.
- `angelic_conditions`: A dictionary mapping indices of angelic condition candidates, to the child index that may be changed.

# Returns
The modified program with angelic conditions added.

"""
function add_angelic_conditions!(program::RuleNode, grammar::AbstractGrammar, angelic_conditions::Dict{UInt16,UInt8})::RuleNode
if isterminal(grammar, program.ind)
return program
end
# If the current node has an angelic child, look for it
if haskey(angelic_conditions, program.ind)
angelic_condition_ind = angelic_conditions[program.ind]
for (index, child) in enumerate(program.children)
# Traverse children for angelic condition candidates
if index != angelic_condition_ind
program.children[index] = add_angelic_conditions!(child, grammar, angelic_conditions)
# A hole represents the angelic condition's location - to be replaced by angelic rulenode before evaluation
else
program.children[index] = Hole(grammar.domains[grammar.childtypes[program.ind][angelic_condition_ind]])
end
end
# Traverse the node's children for angelic condition candidates
else
for (index, child) in enumerate(program.children)
program.children[index] = add_angelic_conditions!(child, grammar, angelic_conditions)
end
end
program
end
82 changes: 82 additions & 0 deletions src/angelic_conditions/long_hash_map.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
const DEFAULT_CAPACITY = 32768
const LOAD_FACTOR = 0.8

"""
An implementation of bare-bones hashmap that only contains the hash keys, i.e. it does not store the elements, only whether they have been seen/added before.
This is used to keep track of programs that have been generated before, so that they are not generated again, but also does not store them to save on space.

A bloom filter would usually be used for this purpose, but we did not want to introduce other dependencies, and a from-scratch implementation is not trivial.

The "hashmap" only stores the hashes of visited programs, and uses linear probing for collision resolution.

"""
mutable struct LongHashMap
keys::Vector{UInt64}
occupied::Vector{Bool}
size::Int
end

function init_long_hash_map()
keys = Vector{UInt64}(undef, DEFAULT_CAPACITY)
occupied = Vector{Bool}(undef, DEFAULT_CAPACITY)
fill!(occupied, false)
LongHashMap(keys, occupied, 0)
end

function lhm_hash(key::UInt64, capacity::Int64)
return (key >>> 32) % capacity
end

# Resizes if the load factor is exceeded
function lhm_resize!(map::LongHashMap)
new_capacity = length(map.keys) * 2
new_keys = Vector{UInt64}(undef, new_capacity)
new_occupied = Vector{Bool}(undef, new_capacity)
fill!(new_occupied, false)

for i in 1:length(map.keys)
if map.occupied[i]
index = lhm_hash(map.keys[i], new_capacity)
while new_occupied[index + 1]
index = (index + 1) % new_capacity
end
new_keys[index + 1] = map.keys[i]
new_occupied[index + 1] = true
end
end

map.keys = new_keys
map.occupied = new_occupied
end

# Adding keys to the hashmap
function lhm_put!(map::LongHashMap, key::UInt64)
# Resize if load factor is exceeded
if (map.size / length(map.keys)) >= LOAD_FACTOR
lhm_resize!(map)
end
index = lhm_hash(key, length(map.keys))
# Collision - linear probing
while map.occupied[index + 1]
if map.keys[index + 1] == key
return
end
index = (index + 1) % length(map.keys)
end
map.keys[index + 1] = key
map.occupied[index + 1] = true
map.size += 1
end

# Checking if it has been seen before
function lhm_contains(map::LongHashMap, key::UInt64)
index = lhm_hash(key, length(map.keys))
# Collision - linear probing
while map.occupied[index + 1]
if map.keys[index + 1] == key
return true
end
index = (index + 1) % length(map.keys)
end
return false
end
Loading
Loading