-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Change how scores are displayed #2466
Comments
I like the idea. @spencerschrock @azeemsgoogle @naveensrinivasan wdut? |
Since completely replacing the X/10 score might be disruptive, we might want to explore supplementing these scores with a percentile, like:
|
Is this for, the badge, the result viewer, or the results themselves? |
I'd say anywhere we display an |
I'm not sure how valuable quantiles are for individual checks, especially given how many checks are "binary" (0 or 10). I also suspect (without looking at any data) that the distributions will be heavily skewed/distorted, which might lead to less nuanced quantiles (i.e. only have top 1% or top 99% quantiles). In my initial proposal, I was actually only thinking of having quantiles for the final score, where we have a pretty reasonable ("normal-ish") distribution. But yes, I'd then show these quantiles everywhere: the CLI output, the viewer, the badge. |
This issue is stale because it has been open for 60 days with no activity. |
This issue is stale because it has been open for 60 days with no activity. |
The OpenSSF Best Practices badge uses "Passing", "Silver", and "Gold" which is easy to see at a glance. libraries must pass all criteria at a level before moving on to the next level. A similar scheme for Scorecard might be: pass X probes for Silver, X + Y for gold, etc. |
This issue has been marked stale because it has been open for 60 days with no activity. |
Is your feature request related to a problem? Please describe.
There's a discrepancy between how good a given score is and how it feels. A 7/10 feels like a passing grade at best, but it actually means a project is in the top ~10% of the most relevant projects (or top ~1% of all projects), for example.
There have been a few maintainers who are surprised to hear that they're actually doing a good job when they get a good score.
twbs/bootstrap#37402 (comment):
numpy/numpy#22482 (comment):
Describe the solution you'd like
A score that feels as good as it actually is. My proposal would be to either replace or supplement the current final score (7/10) with the respective quantile (top x%). The badge should also display the result in quantiles instead of (or as well as) final scores.
This would make everyone (maintainers and users) more accurately understand how solid a project's security posture is.
Even the top projects would have a better experience: I wager some users currently see urllib3's 9.3 and think "wow, that's pretty good, but still clearly needs to improve something!", when their actual understanding should be "wow, this is the most secure open-source project out there!"
Personally, I'd be in favor of the quantile simply supplementing the final score, precisely because (for example) urllib3 might be the most secure open-source project out there, but that missing 0.7 does also point out there's room for improvement. In simple terms:
Additional context
A first issue may be that the histogram of project scores isn't very nuanced: it seems clear from the chart below that GitHub's defaults give projects a score around 4.5/10 (charts obtained via the public BigQuery data), so the ~1 million projects analyzed by Scorecards can basically be categorized as "did something to improve their security" (and are therefore "top ~1%" of projects) or "did something to weaken their security" (and are therefore "bottom ~1%" of projects).
However, if we focus on "important" projects, the chart becomes much more useful:
Naturally, this chart is heavily influenced by how we define "important". For the chart above, I defined it as projects with a criticality_score > 0.5. This choice was completely arbitrary, and just so happens to include ~10,000 projects. Whether this cutoff is appropriate or whether criticality_score is the best tool is naturally something that can (should!) be discussed as well.
It is also worth mentioning that this curve is an almost perfect sigmoid, and therefore calculating the quantile would be quite straightforward, though the equation parameters may need to be updated over time (hopefully due to improving scores across the open-source ecosystem!):
(the vertical axis goes from -25 to 125 because the estimated curve goes slightly above 100 and below 0, but that should be easy to clamp)
The text was updated successfully, but these errors were encountered: