-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforced Google's specification regarding back-off mode #78
base: master
Are you sure you want to change the base?
Conversation
As explained in the specs (https://developers.google.com/safe-browsing/v4/request-frequency), all non-successful return code shall trigger a back-off mode for the API. It was previously implemented for the threatListUpdates.fetch method, but not for fullHashes.find. For that purpose, back-off handling has been moved from the database scope to the API scope, providing a unified interface and behavior regardless of the used method.
Since back-off is handled by the API, it doesn't make sense anymore to test it from the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I would like to see is a way for lookup callers to know when they can make requests again. Some users of this library perform batch lookups, and they'd prefer to wait out an outage.
One solution would be to use WaitUntilReady() to wait on the API status. This isn't great because most LookupURLs will not require a network request, so there is no reason those queries cannot be executed.
Another potential solution would be to return a error that has a public type with an extra method to get the time when requests can be made again. It kinda makes error handling a bit of a mess, but any request can cause an error anyways.
By returning a more informative error, the Database can actually just wait until the specified time, as before.
client *http.Client | ||
url *url.URL | ||
backOffMode bool | ||
updateAPIErrors uint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lock is needed for this. The 'database' was handling locking before, but now we have HashLookup requests happening too (which can happen concurrently).
delay = maxRetryDelay | ||
} | ||
a.updateAPIErrors++ | ||
time.Sleep(delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a sleeping goroutine, how about we have an error count, and a 'time we can make requests again', along with a RWMutex.
So at the top, it would just do:
a.mu.RLock()
if time.Now().Before(someRetryTime) {
return someError
}
a.mu.RUnlock()
then later:
if httpResp.StatusCode != 200 {
return func() error {
r.mu.Lock()
defer r.mu.Unlock()
if time.Now().Before(someRetryTime) {
// Another request already recorded an error in the time we made the request
return someError
}
// Calculate backoff time, update consecutive errors
return someError
}()
}
// Everything good!
mu.Lock()
a.updateAPIErrors = 0
// optionally zero someRetryTime.
mu.Unlock()
Obviously doesn't have to be exactly like this. This way makes it possible to export the 'time we can make requests again', whether via an error, or more directly through a field.
t.Fatalf("update 6, Expected delay %v to be between %v and %v", delay.Seconds(), minDelay, maxDelay) | ||
} | ||
|
||
// Update 7: api is still broken, check backoff is larger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have tests somewhere for the exponential backoff. Probably in api_test.go (that test is a bit ugly right now). I also notice that the API doesn't get the 'Now()' from the configuration, which may make it difficult to write time based tests.
@cpu I think these changes might affect you, want to make sure you see it. |
Thanks for the tag :-) I took a look at the PR and don't have any major feedback beyond what you shared. My use-case is opportunistic and doesn't benefit substantially from a way to block on readyness or to find out when a lookup retry may succeed again. My use-case does rely on the |
No comment |
Back-off mode implementation, accounting for both database updates and full hashes lookups, as described in #77 .
I think having the back-off mode owned by the database was a design flaw in the first place, since it didn't account for fullHashes.find. I moved the handling to the API itself, which allows consistent behaviour, for both lookups and updates. The only side effect is that, on failure, the database update method doesn't get a clear time horizon for retrial. I worked around that by setting a default 1-minute retry frequency. Since everything is checked internally, it doesn't generate unwanted traffic, but has the only downside of causing unneeded additional operations.