Improve percentile calculations #112

sclasen · 2013-08-28T21:28:34Z

Would be great to have this instead/in addition to the raw percentiles over the reporting period.
Perhaps use: https://github.com/bmizerany/perks

ryandotsmith · 2013-08-28T21:34:32Z

Just out of curiosity, do you think there is a user benefit to having this? I know that it is more efficient from an engineering perspective, just not sure it adds any value to users of l2met.

ryandotsmith · 2013-08-28T21:37:52Z

Also, I would be happy to review a PR if you wanted to take a stab at using the library.

sclasen · 2013-08-28T21:41:22Z

Unless I am misunderstanding the l2met code (A distinct possibility!!) :) It looks like the percentiles for a given reporting period are in no way related to previous periods. Is this the case?

I guess I should have added more detail to the request, but this would involve storing the perks structures in redis, pulling them back out and running the current period measurements through them to get you a set of percentiles that represent more than just the current measurement period.

Does that make sense?

ryandotsmith · 2013-08-28T21:57:18Z

I see. I wonder though... Do you really want to carry your statistics across time intervals? For example, lets say there was a strange instance failure at t=0 which caused your latency metrics to spike. Then, at t=1 the problem went away and your latencies returned to normal values. Currently, l2met computes statistics in isolation to the period in which they are measured. What you suggest would be aggregating different time period together. Thus, the incident you had at t=0 would impact your metrics for t=1 which might make understanding what happened much more difficult.

sclasen · 2013-08-28T22:08:47Z

So in the context of alerting i think periods being related is not desireable, but from the context of understanding the long term performance characteristics of a service, you would want to consider measurements across periods.

This I think is one reason why all these statistical methods for calculating percentiles over an unbounded stream have been developed :)

I think your example is correct for t=0 and 1, but by the time you got to t=1000 your outlier would have much less (but importantly still measurable) impact. Similarly if you were at t=1000 when the outlier happened it would have much less impact at t=1001, I think, if these methods were used.

perks is based on one paper, and the percentiles in coda hale metrics on others.

perks 
http://www.cs.rutgers.edu/~muthu/bquant.pdf

coda's

http://www.cs.umd.edu/~samir/498/vitter.pdf

from https://github.com/codahale/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/UniformReservoir.java

and

http://dimacs.rutgers.edu/~graham/pubs/papers/fwddecay.pdf

from https://github.com/codahale/metrics/blob/master/metrics-core/src/main/java/com/codahale/metrics/ExponentiallyDecayingReservoir.java

Thoughts?

ryandotsmith · 2013-08-28T23:37:36Z

Can't deny math. Lets keep this issue open and see what happens.

ryandotsmith · 2013-08-28T23:38:26Z

@sclasen Hope you don't mind that I word-smithed your issue name and desc...

sclasen · 2013-08-28T23:55:13Z

Looks good!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve percentile calculations #112

Improve percentile calculations #112

sclasen commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

sclasen commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

sclasen commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

sclasen commented Aug 28, 2013

Improve percentile calculations #112

Improve percentile calculations #112

Comments

sclasen commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

sclasen commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

sclasen commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

ryandotsmith commented Aug 28, 2013

sclasen commented Aug 28, 2013