Mutation testing - pr differences #4178

ASuciuX · 2023-12-15T14:27:26Z

Applicable issues

This has to be merged before in order to have the references right. Please review this as well

Test/mutants pr differences actions#4

Additional info (benefits, drawbacks, caveats)

Solutions and Recommended Usage

Cargo-mutant does an incremental build for each mutation, that’s why this solution comes with the implementation of tracking the output and only updating it with the differences, to run 10-20, maybe a few 100s mutants instead of 20,000.

CI YML File: Are linked with the actions in the actions' repo stacks-network/actions#3.

Should be executed with every PR, as it tests only the updated or newly added functions in the PR's commits.

How to read the status from the `pr-differences` workflow

If it is not failing, it means all the functions are tested properly.
If it is failing it will display what type of tests are the problem: missing tests, timeout tests, or unviable tests. For each section, it will specify the functions, the files they are in and their line numbers.

Handling Exit Codes (docs)

Mutation testing produces exit codes post-completion.
In the pr-differences workflow they are caught and specified with extra details about what should be done based on them.

- dockerfile and shell script for specific packages - ci.yml for diff on packages on PR

Co-authored-by: jbencin <[email protected]>

- runs for modified files & created files - has to be run before committing the changes ``` cd mutation-testing/scripts sh git-diff.sh ```

codecov · 2023-12-15T14:41:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4b6638a) 82.43% compared to head (5a4f9b3) 82.59%.

❗ Current head 5a4f9b3 differs from pull request most recent head c16d87c. Consider uploading reports for the commit c16d87c to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4178      +/-   ##
===========================================
+ Coverage    82.43%   82.59%   +0.16%     
===========================================
  Files          402      401       -1     
  Lines       291084   288404    -2680     
===========================================
- Hits        239951   238205    -1746     
+ Misses       51133    50199     -934

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wileyj · 2023-12-20T04:59:22Z

moved to draft so @ASuciuX et al can work through some issues discovered in initial tests

ASuciuX · 2023-12-27T17:01:02Z

The Tracking PR Mutants / Incremental Mutants Testing (pull_request) is failing on this because the reference is not valid ( the actions PR has to be merged first)

Cases: - if >= 16 mutants on big packages => run big packages using 8 shards - else run big packages without shards - if >= 16 mutants on big packages and >= 120 mutants on small packages => run small packages using 4 shards - else if < 16 mutants on big packages and >= 80 mutants on small packages => run small packages using 4 shards - else run small packages without shards

ASuciuX · 2024-01-07T22:29:57Z

Overall Flow Diagram #4210 (comment)

.github/workflows/pr-differences-mutants.yml

wileyj · 2024-01-08T21:19:20Z

I'm also considering that you may want to run this workflow only after an approval - otherwise, it will currently trigger on any commit to the branch being PR'ed.
I'm ambivalent if that's the right way to go here - but most tests currently do not run until a PR is approved (only a small sample of workflows are run on commits, with goal being faster iteration. if mutant tests are run on every commit, it may slow down that objective).

ASuciuX · 2024-01-08T23:04:58Z

I'm also considering that you may want to run this workflow only after an approval - otherwise, it will currently trigger on any commit to the branch being PR'ed. I'm ambivalent if that's the right way to go here - but most tests currently do not run until a PR is approved (only a small sample of workflows are run on commits, with goal being faster iteration. if mutant tests are run on every commit, it may slow down that objective).

The goal here was to run the mutants before the PR's approval, since the workflow is a reference to which of the added/modified functions fail when running mutants, so the developers working on the PR can check and fix them and the others can approve the PR based on the fact that there are no mutants unchecked.

- add required input fields to the output job

wileyj · 2024-01-17T00:03:40Z

this is interesting @ASuciuX : https://github.com/stacks-network/stacks-core/actions/runs/7547829177/job/20548598881?pr=4178#step:2:1323

even more peculiar is that it tried to download a musl package to satisfy:

info: downloading https://github.com/cargo-bins/cargo-binstall/releases/download/v1.4.4/cargo-binstall-x86_64-unknown-linux-musl.tgz
info: verifying sha256 checksum for cargo-binstall-x86_64-unknown-linux-musl.tgz
info: cargo-binstall installed at /home/runner/.cargo/bin/cargo-binstall

ASuciuX · 2024-01-17T15:08:18Z

this is interesting @ASuciuX : https://github.com/stacks-network/stacks-core/actions/runs/7547829177/job/20548598881?pr=4178#step:2:1323

even more peculiar is that it tried to download a musl package to satisfy:
info: downloading https://github.com/cargo-bins/cargo-binstall/releases/download/v1.4.4/cargo-binstall-x86_64-unknown-linux-musl.tgz
info: verifying sha256 checksum for cargo-binstall-x86_64-unknown-linux-musl.tgz
info: cargo-binstall installed at /home/runner/.cargo/bin/cargo-binstall

I've reviewed past worflow executions and noticed that this issue has been consistently present in all of them. I believe it will work alright, but i could change it to https://github.com/cargo-bins/cargo-binstall?tab=readme-ov-file#in-github-actions. The interesting part is that binstall and also the mutants docs recommend using the one I've added.

wileyj · 2024-01-17T15:18:08Z

this is interesting @ASuciuX : https://github.com/stacks-network/stacks-core/actions/runs/7547829177/job/20548598881?pr=4178#step:2:1323
even more peculiar is that it tried to download a musl package to satisfy:
info: downloading https://github.com/cargo-bins/cargo-binstall/releases/download/v1.4.4/cargo-binstall-x86_64-unknown-linux-musl.tgz
info: verifying sha256 checksum for cargo-binstall-x86_64-unknown-linux-musl.tgz
info: cargo-binstall installed at /home/runner/.cargo/bin/cargo-binstall
I've reviewed past worflow executions and noticed that this issue has been consistently present in all of them. I believe it will work alright, but i could change it to https://github.com/cargo-bins/cargo-binstall?tab=readme-ov-file#in-github-actions. The interesting part is that binstall and also the mutants docs recommend using the one I've added.

my concern is mostly that the workflow failed and nothing else ran as a result:

thread 'main' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/patch-0.7.0/src/parser.rs:84:5:
bug: failed to parse entire input. Remaining: 'diff --git a/stacks-common/src/libcommon.rs b/stacks-common/src/lib.rs

do you know why that error was triggered?

ASuciuX · 2024-01-18T05:28:00Z

@wileyj thanks for pointing it. just fixed it stacks-network/actions#14

it was related to: renaming .rs files & having no mutants found

wileyj · 2024-01-18T17:14:16Z

@ASuciuX the last question i have here is timing. if i understand correctly, the idea is that this workflow will run on every PR.
The current action has been running for over 2 hours: https://github.com/stacks-network/stacks-core/actions/runs/7547829177?pr=4178

can you share some timings that we can expect based on different types of PR's?
for example, when we merge to master for a release - the CI in total takes around 45-60 minutes. adding another 2+ hours to that time may not be desirable.

ASuciuX · 2024-01-19T06:40:18Z

I've identified and resolved an issue with the git diff command used in our workflow, which you can review here. The workflow run details are available here.

Previously, the command was erroneously capturing changes not only from the source branch but also from the base branch, starting from the initiation of the PR up to the present. This resulted in an extended runtime of about 3 hours for the mutants process, encompassing all differences on the develop branch from December 15 to January 18.

All the shards took less than 1 hour and 10 minutes, except 2 shards:

shard-big-0 took 2h 44m because 1h 50m was the timeout mutant
shard-big-2 took 3h 4m because 1h 48m was the timeout mutant

An enhancement that could significantly optimize our runtime involves dynamically terminating mutant processes based on their initial test durations. For instance, if the initial test for stacks-node takes X minutes, we could set a threshold to automatically stop the process after 1.5X minutes and call that a timeout mutant. Similarly, for stackslib, if the initial test takes Y minutes, the process would halt after 1.5Y minutes. Unfortunately, such a dynamic stopping mechanism isn't currently implemented in cargo mutants.

For context, in previous runs:

Small packages typically completed in under 30 minutes, aided by the use of shards.
Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.
- Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.
- "Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.
- "Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

With the fix applied, the workflow should now correctly process only the relevant changes, leading to more efficient runtimes.

ASuciuX · 2024-01-19T07:00:40Z

On top of the above specified run times, the integration of nextest in cargo-mutants is expected to significantly reduce the time needed for processing caught mutants. For more information, see this discussion.

wileyj · 2024-01-19T18:44:43Z

For context, in previous runs:

Small packages typically completed in under 30 minutes, aided by the use of shards.

Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.

Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.

"Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.

"Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

this is great!
can you add this comment :

- Small packages typically completed in under 30 minutes, aided by the use of shards.
- Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.
    - Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.
    - "Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.
    - "Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

to the file ./docs/ci-release.md? ideally under a section regarding the mutant workflow

wileyj

#4178 (comment)

i think one this documentation comment is addressed, everything else looks good

wileyj

shipit

docs/ci-release.md

.github/workflows/pr-differences-mutants.yml

blockstack-devops · 2024-11-06T00:21:56Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ASuciuX and others added 13 commits November 25, 2023 00:38

feat: mutation testing initial integration

8370213

- dockerfile and shell script for specific packages - ci.yml for diff on packages on PR

Merge branch 'develop' into test/cargo-mutants-testing

06d68b2

fix: made functions discoverable to be mutants

1c5a75c

feat: added mutants output before fix clarity package

19f56ea

feat: added mutants output after fix clarity package

a3caebb

Update mutants-testing-general.sh

e41e89a

Co-authored-by: jbencin <[email protected]>

feat: renamed mod to lib.rs

b1649a3

Delete Dockerfile.mutation-testing as it is also run locally with cargo

32aa967

feat: modular mutations on shell

92b4e23

- runs for modified files & created files - has to be run before committing the changes ``` cd mutation-testing/scripts sh git-diff.sh ```

feat: restructure mutants to new CI workflows

5387f44

feat: update link to stacks action repo

0bf76dd

Merge branch 'develop' into test/cargo-mutants-testing

9616e07

feat: keep only filter pr workflow file

7c91912

ASuciuX self-assigned this Dec 15, 2023

ASuciuX mentioned this pull request Dec 15, 2023

Mutation Testing ( splitted into 2 PRs ) #4089

Closed

6 tasks

added specific triggers for the CI action on PR

491557d

saralab requested review from jcnelson, kantai and wileyj December 15, 2023 15:10

obycode marked this pull request as draft December 19, 2023 15:27

feat: rename from filter-pr to pr-differences

b90dc06

ASuciuX changed the title ~~Mutation testing - filter pr~~ Mutation testing - pr differences Jan 8, 2024

wileyj reviewed Jan 8, 2024

View reviewed changes

.github/workflows/pr-differences-mutants.yml Outdated Show resolved Hide resolved

ASuciuX mentioned this pull request Jan 10, 2024

Move Shells from Main Workflow to Composite Action stacks-network/actions#7

Merged

ASuciuX added 2 commits January 11, 2024 03:28

feat: add documentation for mutation testing

12388b7

feat: mutation testing - update composite branch

5a4f9b3

- add required input fields to the output job

saralab marked this pull request as ready for review January 16, 2024 21:37

feat: renamed back the lib files as cargo-mutants supports them now

16b7bff

Merge branch 'develop' into test/mutants-filter-pr

9b54fcb

saralab requested a review from obycode January 22, 2024 15:41

wileyj requested changes Jan 22, 2024

View reviewed changes

feat: mutants docs - time related outcomes

0875d10

wileyj approved these changes Jan 23, 2024

View reviewed changes