Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate false positive :needs_review flags #141

Open
kaste opened this issue Jun 12, 2021 · 1 comment
Open

Mitigate false positive :needs_review flags #141

kaste opened this issue Jun 12, 2021 · 1 comment

Comments

@kaste
Copy link

kaste commented Jun 12, 2021

This is a follow up to the discussion in discord https://discord.com/channels/280102180189634562/280157083356233728/852608042201907261 and https://discord.com/channels/280102180189634562/280157083356233728/853247224687099904.

I noticed you mark_missing() or mark_missing_by_name() immediately
on the first failure. This is open for flukes when crawling.

The information you persist in these functions is

                is_missing = TRUE,
                missing_error = %s,
                needs_review = %s

Instead of setting is_missing and needs_review immediately, I suggest
a new column

    missing_since            timestamp,    

The side-effect of mark_missing on first failure is just

...
SET 
    missing_since = COALESCE(missing_since, CURRENT_TIMESTAMP),
    missing_error = %s
...

You then make a new task in tasks which you add to the crontab as well
which just goes over the rows and actually sets the flags is_missing and
needs_review iff missing_since is older than ... whatever we think is
appropriate here. Might be 15 minutes, which should lead to 3 tries approx,
or even 1 hour. Something like that:

  UPDATE
  ...
  SET
    is_missing = TRUE,
    needs_review = TRUE
  WHERE
    missing_since < CURRENT_TIMESTAMP - interval '1 hour'

This already makes the is_missing flag better for the end-user as the
end-user gets the offer to install packages that are temporarily down.
The user might get a "404" failure within Sublime. However, such an error
is easy to understand if it means just try again. On a more likely false
positive, the user will just proceed.

For needs_review, you might want to complicate matters as not every
error leads to a needs_review flag. (You have the needs_review(error)
function in python to decide that.)

You can introduce another column, error_indicates_need_review bool, for
that and use it.

...
SET 
    missing_since = COALESCE(missing_since, CURRENT_TIMESTAMP),
    missing_error = %s,
    error_indicates_need_review = error_indicates_need_review or %s
...

and

  UPDATE
  ...
  SET
    is_missing = TRUE,
    needs_review = error_indicates_need_review
  WHERE
    missing_since < CURRENT_TIMESTAMP - interval '1 hour'

Although a rough sketch, I think this makes both the is_missing flag
and needs_review flag more resilient.

Does it resolve the attack vector. Kind of like before, the attacker always races with the speed of the crawler. If the attacker is fast enough, we never mark needs_review as the crawler does not notice any downtime. This does not change with the ideas presented here.

@gordio
Copy link

gordio commented Nov 9, 2022

Is it better to store (and check) created_at from https://api.github.com/repos/wbond/PackageControl.io ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants