-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve percentile calculations #112
Comments
Just out of curiosity, do you think there is a user benefit to having this? I know that it is more efficient from an engineering perspective, just not sure it adds any value to users of l2met. |
Also, I would be happy to review a PR if you wanted to take a stab at using the library. |
Unless I am misunderstanding the l2met code (A distinct possibility!!) :) It looks like the percentiles for a given reporting period are in no way related to previous periods. Is this the case? I guess I should have added more detail to the request, but this would involve storing the perks structures in redis, pulling them back out and running the current period measurements through them to get you a set of percentiles that represent more than just the current measurement period. Does that make sense? |
I see. I wonder though... Do you really want to carry your statistics across time intervals? For example, lets say there was a strange instance failure at t=0 which caused your latency metrics to spike. Then, at t=1 the problem went away and your latencies returned to normal values. Currently, l2met computes statistics in isolation to the period in which they are measured. What you suggest would be aggregating different time period together. Thus, the incident you had at t=0 would impact your metrics for t=1 which might make understanding what happened much more difficult. |
So in the context of alerting i think periods being related is not desireable, but from the context of understanding the long term performance characteristics of a service, you would want to consider measurements across periods. This I think is one reason why all these statistical methods for calculating percentiles over an unbounded stream have been developed :) I think your example is correct for t=0 and 1, but by the time you got to t=1000 your outlier would have much less (but importantly still measurable) impact. Similarly if you were at t=1000 when the outlier happened it would have much less impact at t=1001, I think, if these methods were used. perks is based on one paper, and the percentiles in coda hale metrics on others.
Thoughts? |
Can't deny math. Lets keep this issue open and see what happens. |
@sclasen Hope you don't mind that I word-smithed your issue name and desc... |
Looks good! |
Would be great to have this instead/in addition to the raw percentiles over the reporting period.
Perhaps use: https://github.com/bmizerany/perks
The text was updated successfully, but these errors were encountered: