Mitigate false positive :needs_review flags #141

kaste · 2021-06-12T20:34:06Z

This is a follow up to the discussion in discord https://discord.com/channels/280102180189634562/280157083356233728/852608042201907261 and https://discord.com/channels/280102180189634562/280157083356233728/853247224687099904.

I noticed you mark_missing() or mark_missing_by_name() immediately
on the first failure. This is open for flukes when crawling.

The information you persist in these functions is

                is_missing = TRUE,
                missing_error = %s,
                needs_review = %s

Instead of setting is_missing and needs_review immediately, I suggest
a new column

    missing_since            timestamp,

The side-effect of mark_missing on first failure is just

...
SET 
    missing_since = COALESCE(missing_since, CURRENT_TIMESTAMP),
    missing_error = %s
...

You then make a new task in tasks which you add to the crontab as well
which just goes over the rows and actually sets the flags is_missing and
needs_review iff missing_since is older than ... whatever we think is
appropriate here. Might be 15 minutes, which should lead to 3 tries approx,
or even 1 hour. Something like that:

  UPDATE
  ...
  SET
    is_missing = TRUE,
    needs_review = TRUE
  WHERE
    missing_since < CURRENT_TIMESTAMP - interval '1 hour'

This already makes the is_missing flag better for the end-user as the
end-user gets the offer to install packages that are temporarily down.
The user might get a "404" failure within Sublime. However, such an error
is easy to understand if it means just try again. On a more likely false
positive, the user will just proceed.

For needs_review, you might want to complicate matters as not every
error leads to a needs_review flag. (You have the needs_review(error)
function in python to decide that.)

You can introduce another column, error_indicates_need_review bool, for
that and use it.

...
SET 
    missing_since = COALESCE(missing_since, CURRENT_TIMESTAMP),
    missing_error = %s,
    error_indicates_need_review = error_indicates_need_review or %s
...

and

  UPDATE
  ...
  SET
    is_missing = TRUE,
    needs_review = error_indicates_need_review
  WHERE
    missing_since < CURRENT_TIMESTAMP - interval '1 hour'

Although a rough sketch, I think this makes both the is_missing flag
and needs_review flag more resilient.

Does it resolve the attack vector. Kind of like before, the attacker always races with the speed of the crawler. If the attacker is fast enough, we never mark needs_review as the crawler does not notice any downtime. This does not change with the ideas presented here.

The text was updated successfully, but these errors were encountered:

gordio · 2022-11-09T00:51:41Z

Is it better to store (and check) created_at from https://api.github.com/repos/wbond/PackageControl.io ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate false positive :needs_review flags #141

Mitigate false positive :needs_review flags #141

kaste commented Jun 12, 2021

gordio commented Nov 9, 2022

Mitigate false positive :needs_review flags #141

Mitigate false positive :needs_review flags #141

Comments

kaste commented Jun 12, 2021

gordio commented Nov 9, 2022