Report comparative info for detector scores #814

leondz · 2024-07-31T10:23:17Z

Enable interpretation of scores in a run by calibrating them against a bag of models.

garak/analyze/perf_stats.py takes a glob of report jsonls and calculates mean, standard deviation, and shapiro-wilk p-values (latter is to assess how well the spread of scores fit a normal distribution) for each probe/detector found
garak/resources/calibration contains the files from which stats are derived in a comparison. These contents are generated from perf_stats.py
garak/analyze/report_digest.py and templates updated to calculate a z-score for probe/detector combinations where this is possible, given a default calibration json, to print it in the html output, and to also report what this score means & where it came from

…t from multiple reports

…if configured

…section; include bag details

jmartin-tech

Testing in progress, can you update the description here to explain what this PR does and offer example run of perf_stats.py? Don't need the actual report files just breadcrumbs to follow when we need to update these resources.

garak/analyze/misp.py

garak/analyze/report_digest.py

leondz · 2024-08-01T17:47:52Z

Yup, done!

jmartin-tech

Due to #813 the resource files here need to replace ContinueSlursReclaimedSlurs80 with ContinueSlursReclaimedSlursMini.

leondz · 2024-08-02T04:58:36Z

Outstanding catch

leondz added 11 commits July 31, 2024 11:22

code for calculating probe/detector pass rate means, distribution, fi…

4a27a41

…t from multiple reports

symlink to reduce code churn as calibration file changes

cdc97cc

remove extraneous 'garak' in path

4c5e7b5

prune invalid tag entries

a314b84

add calibration-based z-score reporting to report_digest

d6eebcb

merge w main

25de51f

move from basedir to package_dir

cd757c4

tidy up z-score reporting

c19df61

store datestamp and source filenames (not paths) in calibration object

037f0a4

update with metadata

88e34b4

support printing calibration date in report html

f0a6d76

leondz added the reporting Reporting, analysis, and other per-run result functions label Jul 31, 2024

leondz added 10 commits July 31, 2024 13:33

restore misp info of lmrc tags w/ terse descriptions

6ea8766

document The Bag

d33d36d

update model selection

54aabce

Merge branch 'main' into feature/calibration-calc

618f138

add z-score analysis comment; keep zscore printing on 100% detectors …

e1f0fcd

…if configured

fix failed comment lookup when zscore not in 1-5

3f0b704

bag adjustments

6f0d509

tidy output json

d657504

update to full calibration bag for summer 2024; collapse stats/about …

0a22454

…section; include bag details

my kingdom for a real presentation layer

3ad4909

leondz marked this pull request as ready for review August 1, 2024 16:09

leondz requested a review from jmartin-tech August 1, 2024 16:09

jmartin-tech reviewed Aug 1, 2024

View reviewed changes

garak/analyze/misp.py Show resolved Hide resolved

garak/analyze/report_digest.py Outdated Show resolved Hide resolved

leondz added 2 commits August 1, 2024 20:08

clarify z-score description

e03fafa

explain minimum std dev

01c935a

jmartin-tech approved these changes Aug 1, 2024

View reviewed changes

jmartin-tech requested changes Aug 1, 2024

View reviewed changes

leondz added 3 commits August 2, 2024 07:33

follow updated continuation slurs probe name

3f4d24e

re-order reporting of z-score comments / z-score number

b2e331e

Merge branch 'main' into feature/calibration-calc

3cece61

leondz merged commit ecb959d into main Aug 2, 2024
10 checks passed

github-actions bot locked and limited conversation to collaborators Aug 2, 2024

leondz deleted the feature/calibration-calc branch August 15, 2024 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report comparative info for detector scores #814

Report comparative info for detector scores #814

leondz commented Jul 31, 2024 •

edited

Loading

jmartin-tech left a comment

leondz commented Aug 1, 2024

jmartin-tech left a comment

leondz commented Aug 2, 2024

Report comparative info for detector scores #814

Report comparative info for detector scores #814

Conversation

leondz commented Jul 31, 2024 • edited Loading

jmartin-tech left a comment

Choose a reason for hiding this comment

leondz commented Aug 1, 2024

jmartin-tech left a comment

Choose a reason for hiding this comment

leondz commented Aug 2, 2024

leondz commented Jul 31, 2024 •

edited

Loading