Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry work #22

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Telemetry work #22

wants to merge 1 commit into from

Conversation

brunos-bq
Copy link

Summary

Introducing Telemetry to the JDBC Wrapper

Description

  • Document describing the feature
  • Code changes implementing telemetry to the Wrapper

Additional Reviewers

@karenc-bq
@aaron-congo
@aaronchung-bitquill
@sergiyv-improving

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.


AWS Aurora are databases provided by AWS that are hosted in the AWS cloud. Currently available in Aurora are instances of MySQL and PostgreSQL databases. In order to enable the access from an application to an Aurora database, users need to set up a driver that has the ability to connect and interact with an Aurora database instance set up remotely. Existing drivers for MySQL and PostgreSQL work fine with Aurora instances, but do not take advantage of any of the additional features provided by the Aurora databases.

Recently, AWS has created the AWS JDBC Wrapper driver. The AWS JDBC Wrapper is a driver application that is not a driver in itself, but more of a driver enhancer - an application that set up alongside a database driver, enables some Aurora-specific features to the database driver. Examples of those features are the ability of driver failover (link to doc) and integration with IAM and AWS Secrets Manager (link). The structure of the AWS JDBC Wrapper is organized in plugins, where each new wrapper feature is isolated and independent from other features. That allows users to select which wrapper features/plugins they require in their application workflow.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave the responsibility of explaining what the wrapper is to the repo home page or do you think its helpful to have here as well?


Through the driver, applications interact with a remote database by sending requests. Those requests trigger a chain of execution where every one of the plugins enabled by the user is activated and executed, until the request reaches the database and returns.

In its current form, the AWS JDBC Wrapper is much like a black box: during the chain of execution, there is actually no possibility to measure the individual performance of each plugin throughout the execution of a query. Which means that in the event of loss of performance while using the Wrapper, the troubleshooting process is manual, requiring users to dive deep inside application logs and manually inspecting the behavior of each plugin enabled.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR aims to fix this problem and this doc will be merged in with the solution, should we replace In its current form... with Previously, the AWS...?


OpenTelemetry aims to define vendor and language agnostic specs on how to monitor applications.

The entities we introduce in this project are the following: Metrics and Traces.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we mention metrics before traces on this line, lets introduce them in the same order below (eg metrics section first, then traces section)


### Traces

We define a trace as a defined sequence of an application execution, specifically defined by its start and its end. A trace can contain the entire application execution, or a single atomic operation. Traces can be related either by hierarchy either with symbolic links.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this read by heirarchy or with symbolic links?


Collecting libraries will see the traces boundaries and measure the execution time for a particular trace. For a better understanding on the application behavior, different attributes can be attached to a trace, such as a success status and/or status code.

In our project, traces are represented as `TelemetryContext` objects. A `TelemetryContext` object will be created/opened at every plugin invocation or execution.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "A TelemetryContext object will be created for each wrapper invocation and each individual plugin invocation. For more details see the section on [plugin level tracing](#plugin-level-tracing)."

docs/metrics-document.md Show resolved Hide resolved
docs/metrics-document.md Show resolved Hide resolved
docs/metrics-document.md Show resolved Hide resolved
docs/metrics-document.md Show resolved Hide resolved
### Failover plugin

Metrics:
(Metric name | metric type | Unit (if applicable) | Dimensions | Description)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could consider removing this heading here and below, it seems quite clear without it. However, if we do keep it we could consider either removing the unit/dimension part (since neither are present in any of the metrics) or changing the dimensions part to Dimensions (if applicable)


package software.amazon.jdbc.util.telemetry;

public class NullTelemetryContext implements TelemetryContext {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're getting closer to a final stage of development, we probably need to add more comments to classes and methods. It's a general note.

@aaronchung-bitquill
Copy link

Is the image file docs/images/telemetry-app.png is unused?

try (final Connection conn = DriverManager.getConnection(POSTGRESQL_CONNECTION_STRING, properties);
final Statement statement = conn.createStatement();
final ResultSet rs = statement.executeQuery(SQL_SLEEP)) {
System.out.println(Util.getResult(rs));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does pg_sleep return anything here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pg_sleep returns an empty resultset I believe

@@ -89,6 +106,8 @@ public <T, E extends Exception> T execute(Class<T> resultClass, Class<E> excepti
return jdbcMethodFunc.call();
}

totalCallsCounter.inc();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to the beginning of the method and count it as a call to this plugin even if we enter the if-block on line 105?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we enter the if else block and execute the method directly, we are skipping all the data cache plugin specific code. I don't think we should count that towards our total calls.

if (PropertyDefinition.ENABLE_TELEMETRY.getBoolean(properties)) {
if ("otlp".equalsIgnoreCase(PropertyDefinition.TELEMETRY_METRICS_BACKEND.getString(properties))) {
return OpenTelemetryFactory.createCounter(name);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checking, I'm guessing there isn't an xray createCounter or createGauge method? Not familiar with the differences between xray and open telemetry

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenTelemetry is a standard for tracing and metrics.
XRay is an AWS tool that is only for tracing.

# Proposed solution

## Definitions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following few paragraphs (lines 30-36) don't seem to be related Definitions. Could we move these up a level, right under Proposed solutions?

Metrics will also be added to our observability suite for the numeric data that will be collected throughout the application execution.

For metrics, we follow the same standard defined by OpenTelemetry, which consists of:
- Counters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the plurality of the metrics be consistent?

- Gauge
- Histogram

Gauges will be used for situations where the numeric data is varies, but not incrementally throughout the execution of the application. An example of that would be a hit/miss ratio for a cache. Further information on those 3 entities, and to which kind of data they suit better can be found in the OpenTelemetry documentation (link).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the link for OpenTelemetry documentation missing?

this.properties = properties;
this.monitorServiceSupplier = monitorServiceSupplier;

this.invalidatedConnectionCounter = telemetryFactory.createCounter("efm.connection.invalidated");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to list the names of our counters in metrics-document.md? I don't see this counter being mentioned.

this.properties = properties;
this.monitorServiceSupplier = monitorServiceSupplier;

this.invalidatedConnectionCounter = telemetryFactory.createCounter("efm.connection.invalidated");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this counter updated in or used?

}
}

public static void main(String[] args) throws SQLException {
Copy link
Collaborator

@karenc-bq karenc-bq Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a nullpointerexception when I run the example as-is even though you have the cluster endpoints and credentials specified

> Task :driverexample:MetricsExample.main() FAILED
Exception in thread "main" java.lang.NullPointerException: endpoint
	at java.base/java.util.Objects.requireNonNull(Objects.java:233)
	at io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporterBuilder.setEndpoint(OtlpGrpcSpanExporterBuilder.java:93)
	at software.amazon.MetricsExample.<init>(MetricsExample.java:60)
	at software.amazon.MetricsExample.main(MetricsExample.java:84)

@@ -41,6 +41,7 @@
import software.amazon.jdbc.mock.TestPluginThree;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suites have some build errors right now.

@congoamz
Copy link

congoamz commented Mar 6, 2023

Were we planning on merging this into the bit-quill repo or the aws one? Currently it is targeting bit-quill/main which is not up to date with the aws repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants