Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Duplicate Input Execution #2771

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
aefb8e3
fixing empty multipart name
riesentoaster Dec 6, 2024
a98c981
fixing clippy
riesentoaster Dec 6, 2024
7acf5a3
Merge branch 'main' into main
riesentoaster Dec 6, 2024
2da6dc5
New rules for the contributing (#2752)
tokatoka Dec 6, 2024
1e571a0
Improve Flexibility of DumpToDiskStage (#2753)
riesentoaster Dec 8, 2024
d020b9e
Update bindgen requirement from 0.70.1 to 0.71.1 (#2756)
dependabot[bot] Dec 11, 2024
e1d0b92
No Use* from stages (#2745)
tokatoka Dec 12, 2024
c842eda
Update CONTRIBUTING.md MIGRATION.md (#2762)
tokatoka Dec 12, 2024
31d9b56
No Uses* from `fuzzer` (#2761)
tokatoka Dec 12, 2024
c9eb2a8
Remove useless cfgs (#2764)
tokatoka Dec 12, 2024
93b64f9
Link libresolv on all Apple OSs (#2767)
mineo333 Dec 14, 2024
294d2f1
Somewhat ugly CI fix... (#2768)
domenukk Dec 15, 2024
c170986
Add Input Types and Mutators for Numeric Types (#2760)
riesentoaster Dec 15, 2024
bab9890
Add HashMutator
riesentoaster Dec 15, 2024
71fc1c6
Fix docs
riesentoaster Dec 15, 2024
a2fa10c
Merge branch 'main' into add-label-mutationresult
riesentoaster Dec 15, 2024
30e1db4
Fix docs again
riesentoaster Dec 15, 2024
025a56a
introducing bloom filter
riesentoaster Dec 17, 2024
63b9ac9
fix tests
riesentoaster Dec 17, 2024
92c3f08
Merge branch 'main' into add-label-mutationresult
riesentoaster Dec 17, 2024
6395df9
Merge branch 'main' into add-label-mutationresult
riesentoaster Dec 18, 2024
17c63fe
Merge branch 'main' into add-label-mutationresult
riesentoaster Dec 19, 2024
61120bf
Merge branch 'main' into add-label-mutationresult
riesentoaster Dec 19, 2024
8757a33
Merge branch 'main' into add-label-mutationresult
riesentoaster Dec 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 0 additions & 15 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -56,28 +56,13 @@ license = "MIT OR Apache-2.0"
# Internal deps
libafl = { path = "./libafl", version = "0.14.1", default-features = false }
libafl_bolts = { path = "./libafl_bolts", version = "0.14.1", default-features = false }
libafl_cc = { path = "./libafl_cc", version = "0.14.1", default-features = false }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just testing for I assume?

symcc_runtime = { path = "./libafl_concolic/symcc_runtime", version = "0.14.1", default-features = false }
symcc_libafl = { path = "./libafl_concolic/symcc_libafl", version = "0.14.1", default-features = false }
libafl_derive = { path = "./libafl_derive", version = "0.14.1", default-features = false }
libafl_frida = { path = "./libafl_frida", version = "0.14.1", default-features = false }
libafl_intelpt = { path = "./libafl_intelpt", version = "0.14.1", default-features = false }
libafl_libfuzzer = { path = "./libafl_libfuzzer", version = "0.14.1", default-features = false }
libafl_nyx = { path = "./libafl_nyx", version = "0.14.1", default-features = false }
libafl_targets = { path = "./libafl_targets", version = "0.14.1", default-features = false }
libafl_tinyinst = { path = "./libafl_tinyinst", version = "0.14.1", default-features = false }
libafl_qemu = { path = "./libafl_qemu", version = "0.14.1", default-features = false }
libafl_qemu_build = { path = "./libafl_qemu/libafl_qemu_build", version = "0.14.1", default-features = false }
libafl_qemu_sys = { path = "./libafl_qemu/libafl_qemu_sys", version = "0.14.1", default-features = false }
libafl_sugar = { path = "./libafl_sugar", version = "0.14.1", default-features = false }
dump_constraints = { path = "./libafl_concolic/test/dump_constraints", version = "0.14.1", default-features = false }
runtime_test = { path = "./libafl_concolic/test/runtime_test", version = "0.14.1", default-features = false }
build_and_test_fuzzers = { path = "./utils/build_and_test_fuzzers", version = "0.14.1", default-features = false }
deexit = { path = "./utils/deexit", version = "0.14.1", default-features = false }
drcov_utils = { path = "./utils/drcov_utils", version = "0.14.1", default-features = false }
construct_automata = { path = "./utils/gramatron/construct_automata", version = "0.14.1", default-features = false }
libafl_benches = { path = "./utils/libafl_benches", version = "0.14.1", default-features = false }
libafl_jumper = { path = "./utils/libafl_jumper", version = "0.14.1", default-features = false }

# External deps
ahash = { version = "0.8.11", default-features = false } # The hash function already used in hashbrown
Expand Down
3 changes: 2 additions & 1 deletion fuzzers/baby/baby_fuzzer_custom_executor/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ authors = [
edition = "2021"

[features]
default = ["std"]
default = ["std", "bloom_filter"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature flag should be called by the name of the feature, not by implementation detail. Maybe something like. "reexecution_filter" or similar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bloom_input_filter would be a middle ground(?)

tui = ["libafl/tui_monitor"]
bloom_filter = ["std"]
std = []

[profile.dev]
Expand Down
5 changes: 5 additions & 0 deletions fuzzers/baby/baby_fuzzer_custom_executor/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,12 @@ pub fn main() {
let scheduler = QueueScheduler::new();

// A fuzzer with feedbacks and a corpus scheduler
#[cfg(not(feature = "bloom_filter"))]
let mut fuzzer = StdFuzzer::new(scheduler, feedback, objective);
#[cfg(feature = "bloom_filter")]
let mut fuzzer =
StdFuzzer::new_with_bloom_filter(scheduler, feedback, objective, 10_000_000, 0.001)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • no new_ here, we never have that on complex constructors
  • Instead of with_bloom_filter, name it according to the feature (with_reexecution_filter/with_bloom_input_filter/?)

.unwrap();

// Create the executor for an in-process function with just one observer
let executor = CustomExecutor::new(&state);
Expand Down
1 change: 1 addition & 0 deletions libafl/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,7 @@ document-features = { workspace = true, optional = true }
clap = { workspace = true, optional = true }
num_enum = { workspace = true, optional = true }
libipt = { workspace = true, optional = true }
bloomfilter = "3.0.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look at fastbloom? The benchmarks look pretty good


[lints]
workspace = true
Expand Down
2 changes: 1 addition & 1 deletion libafl/src/executors/inprocess/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -562,7 +562,7 @@ mod tests {
let mut mgr = NopEventManager::new();
let mut state =
StdState::new(rand, corpus, solutions, &mut feedback, &mut objective).unwrap();
let mut fuzzer = StdFuzzer::<_, _, _>::new(sche, feedback, objective);
let mut fuzzer = StdFuzzer::new(sche, feedback, objective);

let mut in_process_executor = InProcessExecutor::new(
&mut harness,
Expand Down
97 changes: 83 additions & 14 deletions libafl/src/fuzzer/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@

use alloc::{string::ToString, vec::Vec};
use core::{fmt::Debug, time::Duration};
#[cfg(feature = "std")]
use std::hash::Hash;

use bloomfilter::Bloom;
use libafl_bolts::{current_time, tuples::MatchName};
use serde::Serialize;

Expand Down Expand Up @@ -243,13 +246,14 @@ pub enum ExecuteInputResult {

/// Your default fuzzer instance, for everyday use.
#[derive(Debug)]
pub struct StdFuzzer<CS, F, OF> {
pub struct StdFuzzer<CS, F, OF, IF> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always sort generics alphabetically

scheduler: CS,
feedback: F,
objective: OF,
input_filter: IF,
}

impl<CS, F, OF, S> HasScheduler<<S::Corpus as Corpus>::Input, S> for StdFuzzer<CS, F, OF>
impl<CS, F, OF, S, IF> HasScheduler<<S::Corpus as Corpus>::Input, S> for StdFuzzer<CS, F, OF, IF>
where
S: HasCorpus,
CS: Scheduler<<S::Corpus as Corpus>::Input, S>,
Expand All @@ -265,7 +269,7 @@ where
}
}

impl<CS, F, OF> HasFeedback for StdFuzzer<CS, F, OF> {
impl<CS, F, OF, IF> HasFeedback for StdFuzzer<CS, F, OF, IF> {
type Feedback = F;

fn feedback(&self) -> &Self::Feedback {
Expand All @@ -277,7 +281,7 @@ impl<CS, F, OF> HasFeedback for StdFuzzer<CS, F, OF> {
}
}

impl<CS, F, OF> HasObjective for StdFuzzer<CS, F, OF> {
impl<CS, F, OF, IF> HasObjective for StdFuzzer<CS, F, OF, IF> {
type Objective = OF;

fn objective(&self) -> &OF {
Expand All @@ -289,8 +293,8 @@ impl<CS, F, OF> HasObjective for StdFuzzer<CS, F, OF> {
}
}

impl<CS, EM, F, OF, OT, S> ExecutionProcessor<EM, <S::Corpus as Corpus>::Input, OT, S>
for StdFuzzer<CS, F, OF>
impl<CS, EM, F, OF, OT, S, IF> ExecutionProcessor<EM, <S::Corpus as Corpus>::Input, OT, S>
for StdFuzzer<CS, F, OF, IF>
where
CS: Scheduler<<S::Corpus as Corpus>::Input, S>,
EM: EventFirer<State = S>,
Expand Down Expand Up @@ -491,8 +495,8 @@ where
}
}

impl<CS, E, EM, F, OF, S> EvaluatorObservers<E, EM, <S::Corpus as Corpus>::Input, S>
for StdFuzzer<CS, F, OF>
impl<CS, E, EM, F, OF, S, IF> EvaluatorObservers<E, EM, <S::Corpus as Corpus>::Input, S>
for StdFuzzer<CS, F, OF, IF>
where
CS: Scheduler<<S::Corpus as Corpus>::Input, S>,
E: HasObservers + Executor<EM, Self, State = S>,
Expand Down Expand Up @@ -528,7 +532,43 @@ where
}
}

impl<CS, E, EM, F, OF, S> Evaluator<E, EM, <S::Corpus as Corpus>::Input, S> for StdFuzzer<CS, F, OF>
trait InputFilter<I> {
fn should_execute(&mut self, input: &I) -> bool;
}

/// A pseudo-filter that will execute each input.
#[derive(Debug)]
pub struct NopInputFilter;
impl<I> InputFilter<I> for NopInputFilter {
fn should_execute(&mut self, _input: &I) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[inline]

true
}
}

/// A filter that probabilistically prevents duplicate execution of the same input based on a bloom filter.
#[cfg(feature = "std")]
#[derive(Debug)]
pub struct BloomInputFilter<I> {
bloom: Bloom<I>,
}

#[cfg(feature = "std")]
impl<I> BloomInputFilter<I> {
fn new(items_count: usize, fp_p: f64) -> Result<Self, Error> {
let bloom = Bloom::new_for_fp_rate(items_count, fp_p).map_err(Error::illegal_argument)?;
Ok(Self { bloom })
}
}

#[cfg(feature = "std")]
impl<I: Hash> InputFilter<I> for BloomInputFilter<I> {
fn should_execute(&mut self, input: &I) -> bool {
!self.bloom.check_and_set(input)
}
}

impl<CS, E, EM, F, OF, S, IF> Evaluator<E, EM, <S::Corpus as Corpus>::Input, S>
for StdFuzzer<CS, F, OF, IF>
where
CS: Scheduler<<S::Corpus as Corpus>::Input, S>,
E: HasObservers + Executor<EM, Self, State = S>,
Expand All @@ -545,6 +585,7 @@ where
+ UsesInput<Input = <S::Corpus as Corpus>::Input>,
<S::Corpus as Corpus>::Input: Input,
S::Solutions: Corpus<Input = <S::Corpus as Corpus>::Input>,
IF: InputFilter<<S::Corpus as Corpus>::Input>,
{
/// Process one input, adding to the respective corpora if needed and firing the right events
#[inline]
Expand All @@ -556,7 +597,11 @@ where
input: <S::Corpus as Corpus>::Input,
send_events: bool,
) -> Result<(ExecuteInputResult, Option<CorpusId>), Error> {
self.evaluate_input_with_observers(state, executor, manager, input, send_events)
if self.input_filter.should_execute(&input) {
self.evaluate_input_with_observers(state, executor, manager, input, send_events)
} else {
Ok((ExecuteInputResult::None, None))
}
}
fn add_disabled_input(
&mut self,
Expand Down Expand Up @@ -668,7 +713,7 @@ where
}
}

impl<CS, E, EM, F, OF, S, ST> Fuzzer<E, EM, S, ST> for StdFuzzer<CS, F, OF>
impl<CS, E, EM, F, OF, S, ST, IF> Fuzzer<E, EM, S, ST> for StdFuzzer<CS, F, OF, IF>
where
CS: Scheduler<S::Input, S>,
E: UsesState<State = S>,
Expand Down Expand Up @@ -792,16 +837,40 @@ where
}
}

impl<CS, F, OF> StdFuzzer<CS, F, OF> {
impl<CS, F, OF> StdFuzzer<CS, F, OF, NopInputFilter> {
/// Create a new `StdFuzzer` with standard behavior.
pub fn new(scheduler: CS, feedback: F, objective: OF) -> Self {
Self {
scheduler,
feedback,
objective,
input_filter: NopInputFilter,
}
}
}
impl<CS, F, OF, I> StdFuzzer<CS, F, OF, BloomInputFilter<I>> {
/// Create a new [`StdFuzzer`], which, with a certain certainty, executes each input only once.
///
/// This is achieved by hashing each input and using a bloom filter to differentiate inputs.
///
/// Use this implementation if hashing each input is very fast compared to executing potential duplicate inputs.
pub fn new_with_bloom_filter(
scheduler: CS,
feedback: F,
objective: OF,
items_count: usize,
fp_p: f64,
) -> Result<Self, Error> {
let input_filter = BloomInputFilter::new(items_count, fp_p)?;

Ok(Self {
scheduler,
feedback,
objective,
input_filter,
})
}
}

/// Structs with this trait will execute an input
pub trait ExecutesInput<E, EM, I, S> {
Expand All @@ -815,8 +884,8 @@ pub trait ExecutesInput<E, EM, I, S> {
) -> Result<ExitKind, Error>;
}

impl<CS, E, EM, F, OF, S> ExecutesInput<E, EM, <S::Corpus as Corpus>::Input, S>
for StdFuzzer<CS, F, OF>
impl<CS, E, EM, F, OF, S, IF> ExecutesInput<E, EM, <S::Corpus as Corpus>::Input, S>
for StdFuzzer<CS, F, OF, IF>
where
CS: Scheduler<<S::Corpus as Corpus>::Input, S>,
E: Executor<EM, Self, State = S> + HasObservers,
Expand Down
80 changes: 80 additions & 0 deletions libafl/src/mutators/hash.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
//! A wrapper around a [`Mutator`] that ensures an input really changed [`MutationResult::Mutated`]
//! by hashing pre- and post-mutation
use std::{borrow::Cow, hash::Hash};

use libafl_bolts::{generic_hash_std, Error, Named};

use super::{MutationResult, Mutator};

/// A wrapper around a [`Mutator`] that ensures an input really changed [`MutationResult::Mutated`]
/// by hashing pre- and post-mutation
#[derive(Debug)]
pub struct HashMutator<M> {
inner: M,
name: Cow<'static, str>,
}

impl<M> HashMutator<M>
where
M: Named,
{
/// Create a new [`HashMutator`]
pub fn new(inner: M) -> Self {
let name = Cow::Owned(format!("HashMutator<{}>", inner.name().clone()));
Self { inner, name }
}
}

impl<M, I, S> Mutator<I, S> for HashMutator<M>
where
I: Hash,
M: Mutator<I, S>,
{
fn mutate(&mut self, state: &mut S, input: &mut I) -> Result<MutationResult, Error> {
let before = generic_hash_std(input);
self.inner.mutate(state, input)?;
if before == generic_hash_std(input) {
Ok(MutationResult::Skipped)
} else {
Ok(MutationResult::Mutated)
}
}
}

impl<M> Named for HashMutator<M> {
fn name(&self) -> &Cow<'static, str> {
&self.name
}
}

#[cfg(test)]
mod tests {
use crate::{
inputs::BytesInput,
mutators::{BytesSetMutator, HashMutator, MutationResult, Mutator},
state::NopState,
};

#[test]
fn not_mutated() {
let mut state: NopState<BytesInput> = NopState::new();
let mut inner = BytesSetMutator::new();

let mut input = BytesInput::new(vec![0; 5]);

// nothing changed, yet `MutationResult::Mutated` was reported
assert_eq!(
MutationResult::Mutated,
inner.mutate(&mut state, &mut input).unwrap()
);
assert_eq!(BytesInput::new(vec![0; 5]), input);

// now it is correctly reported as `MutationResult::Skipped`
let mut hash_mutator = HashMutator::new(inner);
assert_eq!(
MutationResult::Skipped,
hash_mutator.mutate(&mut state, &mut input).unwrap()
);
assert_eq!(BytesInput::new(vec![0; 5]), input);
}
}
15 changes: 11 additions & 4 deletions libafl/src/mutators/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ pub use mapping::*;
pub mod tuneable;
pub use tuneable::*;

#[cfg(feature = "std")]
pub mod hash;
#[cfg(feature = "std")]
pub use hash::*;

#[cfg(feature = "unicode")]
pub mod unicode;
#[cfg(feature = "unicode")]
Expand Down Expand Up @@ -84,12 +89,14 @@ impl From<i32> for MutationId {
}
}

/// The result of a mutation.
/// If the mutation got skipped, the target
/// will not be executed with the returned input.
/// Result of the mutation.
///
/// [`MutationResult::Skipped`] does not necessarily mean that the input changed,
/// just that the mutator did something. For slow targets, consider wrapping your
/// mutator in a [`hash::HashMutator`].
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum MutationResult {
/// The [`Mutator`] mutated this `Input`.
/// The [`Mutator`] executed on this `Input`. It may still be the same.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like:
The mutator may not guarantee that the input has actually been changed.

You could even reference the bloom filter feature here

Mutated,
/// The [`Mutator`] did not mutate this `Input`. It was `Skipped`.
Skipped,
Expand Down
8 changes: 3 additions & 5 deletions libafl_bolts/src/shmem.rs
Original file line number Diff line number Diff line change
Expand Up @@ -626,10 +626,7 @@ where
pub mod unix_shmem {
/// Mmap [`ShMem`] for Unix
#[cfg(not(target_os = "android"))]
pub use default::MmapShMem;
/// Mmap [`ShMemProvider`] for Unix
#[cfg(not(target_os = "android"))]
pub use default::MmapShMemProvider;
pub use default::{MmapShMem, MmapShMemProvider, MAX_MMAP_FILENAME_LEN};

#[cfg(doc)]
use crate::shmem::{ShMem, ShMemProvider};
Expand Down Expand Up @@ -669,7 +666,8 @@ pub mod unix_shmem {
Error,
};

const MAX_MMAP_FILENAME_LEN: usize = 20;
/// The max number of bytes used when generating names for [`MmapShMem`]s.
pub const MAX_MMAP_FILENAME_LEN: usize = 20;

/// Mmap-based The sharedmap impl for unix using [`shm_open`] and [`mmap`].
/// Default on `MacOS` and `iOS`, where we need a central point to unmap
Expand Down
Loading