Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with google/oss-fuzz for continuous fuzz-testing? #1030

Open
nathaniel-brough opened this issue Oct 3, 2024 · 6 comments
Open
Labels

Comments

@nathaniel-brough
Copy link

Hey I'd like to suggest adding littlefs to google/oss-fuzz. If you aren't familiar with fuzz testing, here is a bit of a run down (from Wikipedia);

In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.

Google offers a free continuous fuzzing service called OSS-fuzz. If littlefs is integrated into oss-fuzz, the fuzz tests under littlefs will be built and then run once a day, to search for bugs and vulnerabilities in littlefs. This service can be integrated with the CI for littlefs, so that the fuzz tests are run for 10min or so for every pull request, preventing buggy code from being merged.

I've opened up a pull request to add a basic fuzz-testing harness here #1029. If you are keen on adding littlefs to oss-fuzz I'd be happy to champion the integration :)

@nathaniel-brough
Copy link
Author

Ah it looks like there is already some fuzzing or something similar done through littlefs-fuse?

# self-host with littlefs-fuse for a fuzz-like test

This looks a little different to the kind of fuzzing that I'm accustomed too.

@nathaniel-brough nathaniel-brough changed the title Integrate with google/oss-fuzz for continuous fuzz-testing Integrate with google/oss-fuzz for continuous fuzz-testing? Oct 3, 2024
@geky geky added the ci label Oct 3, 2024
@geky
Copy link
Member

geky commented Oct 3, 2024

Hi @silvergasp, thanks for creating an issue and volunteering on this. I think it would be quite neat to seem something like this integrated eventually.

Unfortunately the timing is a bit rough. There's a large body of unreleased work in progress, and until that lands I'm not sure it makes sense prioritization wise to aggressively search out bugs in the current codebase. That is unless you're also volunteering to fix them :)

That being said, I think this idea is a good idea eventually, that eventually just not being now unfortunately...

@geky
Copy link
Member

geky commented Oct 3, 2024

Though there are still interesting things to discuss related to fuzz testing.

If littlefs is integrated into oss-fuzz, the fuzz tests under littlefs will be built and then run once a day, to search for bugs and vulnerabilities in littlefs.

This seems very useful, eventually, when time can be allocated on bug fixing.

This service can be integrated with the CI for littlefs, so that the fuzz tests are run for 10min or so for every pull request, preventing buggy code from being merged.

So this is quite an interesting topic. Fuzz tests on PRs seem like a good idea initially, but are actually a nightmare for contributors in practice.

Quite a bit of effort actually went in to making sure the current test runner/tests are strictly deterministic, so any CI failure is a result of, and only a result of, changes in the given PR.

This doesn't rule out randomness testing, and I've been using quite a bit of it in the above-mentioned body of work, but any randomness testing on PRs needs to be derived from a reproducible seed that was also tested on master.

Ah it looks like there is already some fuzzing or something similar done through littlefs-fuse?

Ah, not really. Describing that as "fuzz-like" is probably inaccurate. It's "fuzz-like" in that it's not fuzz testing. It's just a very high-level test (compiling/simulating littlefs on littlefs via fuse) that is high-level enough you no longer know exactly what is being tested, similar to fuzz testing. Still, it did find a number of unknown bugs during initial development.

@nathaniel-brough
Copy link
Author

Unfortunately the timing is a bit rough. There's a large body of unreleased work in progress, and until that lands I'm not sure it makes sense prioritization wise to aggressively search out bugs in the current codebase. That is unless you're also volunteering to fix them :)

What are your thoughts on me re-targeting this fuzzing effort onto the unreleased work? I imagine that it'd be useful in stabilising the unreleased work. I've personally found fuzzing pretty useful in day-to-day development as it will often quickly find edge cases I hadn't thought about. I've often found security vulnerabilities e.g. buffer overflows etc. in my own work before I released it.

I'm happy to contribute fixes, although I only have a surface level familiarity with the littlefs codebase.

So this is quite an interesting topic. Fuzz tests on PRs seem like a good idea initially, but are actually a nightmare for contributors in practice.

Quite a bit of effort actually went in to making sure the current test runner/tests are strictly deterministic, so any CI failure is a result of, and only a result of, changes in the given PR.

Yeah that's a reasonable stance, the PR based CI integration is entirely optional and not a prerequisite for integration into oss-fuzz.

This doesn't rule out randomness testing, and I've been using quite a bit of it in the above-mentioned body of work, but any randomness testing on PRs needs to be derived from a reproducible seed that was also tested on master.

It is possible to configure libfuzzer to be deterministic via the "-seed" command line option. It's worth noting that I would be using clusterfuzz lite for PR based fuzzing which does upload the artifacts to github i.e. the fuzzing input that triggered a bug. So it is easy enough to reproduce a bug even if the CI isn't deterministic. e.g. it would look something like this.

# download fuzz input from github action artifacts.

# Run the fuzz harness with a single input. No actual fuzzing is being performed here, just a replay.
./fuzz_mount downloaded-crash-input

After a bit of a look into your existing random/fuzz testing it looks like you are doing something like open-loop fuzzing. It looks like it would require fairly minimal modifications to port your fuzz tests over to libfuzzer which would significantly improve the speed and performance of fuzzing. Libfuzzer starts off passing a bunch of random data into your fuzz harness, each time measuring the code-coverage. Then it picks the input that produced the most code coverage and randomly mutates the input to create new inputs. This process is repeated in such a way that the "goal" of the fuzzer becomes to maximise code-coverage in a closed-loop system.

Black box fuzzing (existing)
This approach to fuzzing is considered to be open-loop and does not use feedback to improve future results.

flowchart LR
    A[Random Input] --> B[Fuzz Harness]
    B --> A
Loading

With black-box fuzzing the probability of finding a bug does not change over time.

Grey box fuzzing (libfuzzer)
This approach to fuzzing is considered to be closed-loop and does use feedback to improve future results.

flowchart LR
    A[Random Input] --> B[Fuzz Harness]
    B --> C[Collect code coverage information]
    C -->  D[Mutate orgional input to maximize coverage] 
    D --> B
Loading

The probability of finding a bug in a closed-loop system like this increases over time.

@geky
Copy link
Member

geky commented Oct 4, 2024

What are your thoughts on me re-targeting this fuzzing effort onto the unreleased work? I imagine that it'd be useful in stabilising the unreleased work. I've personally found fuzzing pretty useful in day-to-day development as it will often quickly find edge cases I hadn't thought about. I've often found security vulnerabilities e.g. buffer overflows etc. in my own work before I released it.

Oh, that's not a bad idea.

I guess the only problem is that this work is still in flux and likely to change. I don't think I will have the bandwidth to respond to bug reports until this work enters the stabilization phase. Though at that point fuzz testing work like this would be very useful.

It's worth noting that I would be using clusterfuzz lite for PR based fuzzing which does upload the artifacts to github i.e. the fuzzing input that triggered a bug. So it is easy enough to reproduce a bug even if the CI isn't deterministic.

It's not so much the reproducibility, though that is neat, as much as it's trying to prevent new contributers/PRs from finding bugs they are not responsible for and are unlikely to know how to deal with.

After a bit of a look into your existing random/fuzz testing it looks like you are doing something like open-loop fuzzing. It looks like it would require fairly minimal modifications to port your fuzz tests over to libfuzzer which would significantly improve the speed and performance of fuzzing.

Ah yes, let me be the first to admit that the fuzz testing in the test_runner is naive. But the flipside of this is that it's also simple. Writing tests involves almost as much debugging of the tests as the filesystem itself, so I'm a bit hesitant to add external dependencies to the test_runner core.

But that doesn't prevent having additional testing to run afterwards.

It would be interesting if clusterfuzz/oss-fuzz could generate a test case in littlefs's test toml format, so we could easily compile previously found bugs into a suite of small regression-preventing tests.

@nathaniel-brough
Copy link
Author

I guess the only problem is that this work is still in flux and likely to change. I don't think I will have the bandwidth to respond to bug reports until this work enters the stabilization phase. Though at that point fuzz testing work like this would be very useful.

No worries it seems like now isn't necessarily great timing. But if down the line you are nearing stabilisation and you want to go "hey please fuzz-test this part of the API" feel free to ping me and I'll have a crack at it.

It's not so much the reproducibility, though that is neat, as much as it's trying to prevent new contributers/PRs from finding bugs they are not responsible for and are unlikely to know how to deal with.

Yeah, that seems reasonable. Perhaps this might be more useful when oss-fuzz hasn't found a new bug in a month or two. Then the probability of the PR fuzzer finding something that a contributor isn't responsible for would be fairly low. Especially given that oss-fuzz runs every night for a few hours on a distributed cluster. Whereas the CI fuzzer would be running on a single core for 5-10min in github actions.

Ah yes, let me be the first to admit that the fuzz testing in the test_runner is naive. But the flipside of this is that it's also simple. Writing tests involves almost as much debugging of the tests as the filesystem itself, so I'm a bit hesitant to add external dependencies to the test_runner core.

I think the fuzz testing harnesses themselves seem to be quite extensive and pretty far from naive. But perhaps the method of driving the harnesses could be improved by switching to a coverage driven fuzzing framework like libfuzzer.

It's worth noting that libfuzzer is shipped as a component of llvm/clang. So while it is true that it does add a dependency it's largely covered by just switching from gcc->clang.

It would be interesting if clusterfuzz/oss-fuzz could generate a test case in littlefs's test toml format, so we could easily compile previously found bugs into a suite of small regression-preventing tests.

Hmm I think it would be possible, although I'd have to do a bit more of a deep dive into your existing testing infrastructure to be sure. There is some work going into structure-aware fuzzing but that does require additional dependencies on protobuf for example. It might look something like toml->protobuf->fuzzing_seed_corpus. So it seems possible, but maybe a little complicated for a first pass.

Another option for this might be to use google/fuzztest for the harnesses which will generate googletest regression harnesses when it finds a failure. google/fuzztest also has compatibility layers for libfuzzer and integrates nicely into oss-fuzz. But it does add a couple third-party dependencies and additional build complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants