Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement java exponential histograms (#28903) #28995

Merged
merged 5 commits into from
Oct 18, 2023

Conversation

JayajP
Copy link
Contributor

@JayajP JayajP commented Oct 13, 2023

Implement exponential histogram metric in Java.

Currently this only supports a scaling factor of 2^(2^k) for an integer k between [-3, 3].
e.g. Histograms with a growth factor of 2^(1/8), 2^(1/4), 2^(1/2), 2, 2^2.
See HistogramData::exponential for further details.

For the initial use-case we will use histograms with a growth factor of sqrt(2) to record BigQuery write latencies.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@github-actions github-actions bot added the java label Oct 13, 2023
@JayajP JayajP force-pushed the exponentialhistograms branch from 448b72b to 9664a02 Compare October 13, 2023 21:45
public abstract double getRangeTo();

public static ExponentialBuckets of(int scale, int numBuckets) {
if (scale < -3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a semantic meaning to -3?, same with the 3 on line 305.

can this be a constant declaration (private static final) with the semantic meaning in the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private static int computeNumberOfBuckets(int scale, int inputNumBuckets) {
if (scale == 0) {
// When base=2 then the bucket at index 31 contains [2^31, 2^32).
return Math.min(32, inputNumBuckets);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as comment above, maybe rename and declare as private static final int MAX_INPUT_NUM_BUCKETS = 32 inside ExponentialBuckets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* </pre>
*
* <pre>
* Example sacle/boundaries:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* When scale=-1, buckets 0,1,2...i have lowerbounds 0, 4, 4^2, ... 4^(i).
* </pre>
*
* Scale parameter is similar to OpenTelemetry's notion of ExponentialHistogram.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<a href="https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram">OpenTelemetry's notion of ExponentialHistogram</a>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 206 to 208
// The following tests cover exponential buckets.
@Test
public void testPositiveScaleBucket() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename the tests that test exponential buckets to

testExponentialBuckets_{TEST_CASE}
for example testExponentialBuckets_positiveScaleBuckets()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 307 to 334
HistogramData data = HistogramData.exponential(0, 20);
assertThat(data.getBucketType().getBucketSize(0), equalTo(2.0));
// 10th bucket contains [2^10, 2^11).
assertThat(data.getBucketType().getBucketSize(10), equalTo(1024.0));

data = HistogramData.exponential(1, 20);
assertThat(data.getBucketType().getBucketSize(0), equalTo(Math.sqrt(2)));
// 10th bucket contains [2^5, 2^5.5).
assertThat(data.getBucketType().getBucketSize(10), closeTo(13.2, .1));

data = HistogramData.exponential(-1, 20);
assertThat(data.getBucketType().getBucketSize(0), equalTo(4.0));
// 10th bucket contains [2^20, 2^22).
assertThat(data.getBucketType().getBucketSize(10), equalTo(3145728.0));
}

@Test
public void testNumBuckets() {
// Validate that numBuckets clipping WAI.
HistogramData data = HistogramData.exponential(0, 200);
assertThat(data.getBucketType().getNumBuckets(), equalTo(32));

data = HistogramData.exponential(3, 500);
assertThat(data.getBucketType().getNumBuckets(), equalTo(32 * 8));

data = HistogramData.exponential(-3, 500);
assertThat(data.getBucketType().getNumBuckets(), equalTo(4));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since HistogramData.exponential(...) creates a new instance of HistogramData, how about just declaring a new variable for every instance for clarity?

It is also ok to have even smaller tests with narrower assertions if each instance is testing a different part of the behavior of the method (in this casegetNumBuckets) go/tott/648

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Copy link
Contributor

@m-trieu m-trieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Abacn
Copy link
Contributor

Abacn commented Oct 18, 2023

Run Java PreCommit

@Abacn
Copy link
Contributor

Abacn commented Oct 18, 2023

Different test failures on https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/6297/ and https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/28670/ not likely relevant, merging for now

@Abacn Abacn merged commit 9aaf9c2 into apache:master Oct 18, 2023
24 of 27 checks passed
kkdoon pushed a commit to twitter-forks/beam that referenced this pull request Oct 21, 2023
* Implement java exponential histograms (apache#28903)

* Address comments

* Address comments
@JayajP JayajP deleted the exponentialhistograms branch November 15, 2023 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants