Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Batch Metrics API #159

Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
350cdd2
Add batch metrics api
ricardolstephen Jul 24, 2020
80c492f
Fixup cherry-pick
ricardolstephen Jul 28, 2020
d1c1bea
Cleanup code
ricardolstephen Jul 28, 2020
9aa4a75
Expose batch metrics over the agent api
ricardolstephen Jul 29, 2020
b41197d
Return retention period during config queries
ricardolstephen Aug 3, 2020
a13cd8b
Enable more tests
ricardolstephen Aug 5, 2020
0f90bf4
Add batch metrics tests
ricardolstephen Aug 13, 2020
d69645d
Merge branch 'master' into batch-metrics-api-v3
ricardolstephen Aug 24, 2020
17647bf
Add time unit to the batch-metrics-retention-period variable
ricardolstephen Aug 24, 2020
77f97d0
Add batch metrics api documentation to README
ricardolstephen Aug 24, 2020
610b25b
Update batch metrics api documentation
ricardolstephen Aug 24, 2020
ef67704
Add time unit to batch metrics retention period
ricardolstephen Aug 25, 2020
acc0fa3
Add documentation about samplingperiod
ricardolstephen Sep 2, 2020
4c994a4
Update batch metrics section of README
ricardolstephen Sep 3, 2020
a5a8235
Update batch metrics section of README
ricardolstephen Sep 4, 2020
907bbc7
Update batch metrics api sample query
ricardolstephen Sep 4, 2020
238d203
Merge branch 'master' into batch-metrics-api-v3
ricardolstephen Sep 4, 2020
d59e8fa
Add minor change to README
ricardolstephen Sep 4, 2020
81b5c1e
Note why max datapoints was capped
ricardolstephen Sep 8, 2020
a4e0235
Make default enable values private static final
ricardolstephen Sep 9, 2020
aeee5a1
Update batch metrics docs
ricardolstephen Sep 9, 2020
dbce9ab
Minor changes in readme and error handling
ricardolstephen Sep 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,54 @@ Then you provide parameters for metrics, aggregations, dimensions, and nodes (op
GET `_opendistro/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg,max&dim=ShardID&nodes=all`


## Batch Metrics API
While the metrics api associated with performance analyzer provides the last 5 seconds worth of metrics, the batch metrics api provides more detailed metrics and from longer periods of time. See the [design doc](https://github.com/opendistro-for-elasticsearch/performance-analyzer-rca/blob/master/docs/batch-metrics-api.md) for more information.

In order to access the batch metrics api, first enable it using one of the following HTTP request:

```
POST localhost:9200/_opendistro/performanceanalyzer/batch/config -H ‘Content-Type: application/json’ -d ‘{“enabled”: true}’
yojs marked this conversation as resolved.
Show resolved Hide resolved
POST localhost:9200/_opendistro/performanceanalyzer/batch/cluster/config -H ‘Content-Type: application/json’ -d ‘{“enabled”: true}’
```

The former enables batch metrics on a single node, while the latter enables it on nodes across the entire cluster. Batch metrics can be disabled using analogous queries with `{“enabled”: false}`.
yojs marked this conversation as resolved.
Show resolved Hide resolved

You can then query either the config or cluster config apis to see how many minutes worth of batch metrics data will be retained by nodes in the cluster (`batchMetricsRetentionPeriodMinutes`):

```
GET localhost:9200/_opendistro/_performanceanalyzer/config

{"performanceAnalyzerEnabled":true,"rcaEnabled":false,"loggingEnabled":false,"shardsPerCollection":0,"batchMetricsEnabled":true,"batchMetricsRetentionPeriodMinutes":7}

GET localhost:9200/_opendistro/_performanceanalyzer/cluster/config

{"currentPerformanceAnalyzerClusterState":9,"shardsPerCollection":0,"batchMetricsRetentionPeriodMinutes":7}
```

The default retention period is 7 minutes, but the cluster owner can adjust this by setting `batch-metrics-retention-period-minutes` in performance-analyzer.properties. The value must be between 1 and 60 minutes (inclusive) — the range is capped like so in order to prevent excessive data retention on the cluster, which would eat up a lot of storage.

You can then access the batch metrics available at each node via queries of the following format:

```
GET localhost:9600/_opendistro/_performanceanalyzer/batch?metrics=<metrics>&starttime=<starttime>&endtime=<endtime>&samplingperiod=<samplingperiod>
```

* metrics - Comma separated list of metrics you are interested in. For a full list of metrics, see Metrics Reference.
* starttime - Unix timestamp (difference between the current time and midnight, January 1, 1970 UTC) in milliseconds determining the oldest data point to return. starttime is inclusive — data points from at or after the starttime will be returned. Note, the starttime and endtime supplied by the user will both be rounded down to the nearest samplingperiod. starttime must be no less than `now - retention_period` and it must be less than the endtime (after the rounding).
* endtime - Unix timestamp in milliseconds determining the freshest data point to return. endtime is exclusive — only datapoints from before the endtime will be returned. endtime must be no greater than the system time at the node, and it must be greater than the startime (after being rounded down to the nearest samplingperiod).
* samplingperiod - Optional parameter indicating the sampling period in seconds (default is 5s). The requested time range will be partitioned according to the sampling period, and data from the first available 5s interval in each partition will be returned to the user. Must be at least 5s, must be less than the retention period, and must be a multiple of 5.

Note, the maximum number of datapoints that a single query can request for via API is capped at 100,800 datapoints (in order to prevent excessive memory consumption by the datapoints). If a query exceeds this limit, an error is returned. The query parameters can be adjusted on such queries to request for fewer datapoints at a time.

Note, unlike with the metrics api, there is no `nodes=all` parameter for the batch metrics api. You must query a specific node in order to obtain metrics from that node.

yojs marked this conversation as resolved.
Show resolved Hide resolved
Note, the default retention period is 7 minutes because a typical use-case would be to query for 5 minutes worth of data from the node. In order to do this, a client would actually select a starttime of now-6min and an endtime of now-1min (this one minute offset will give sufficient time for the metrics in the time range to be available at the node). Atop this 6 minutes of retention, we need an extra 1 minute of retention to account for the time that would have passed by the time the query arrives at the node, and for the fact that starttime and endtime will be rounded down to the nearest samplingperiod.

### SAMPLE REQUEST
GET `_opendistro/_performanceanalyzer/batch?metrics=CPU_Utilization,IO_TotThroughput&starttime=1594412250000&endtime=1594412260000&samplingperiod=5`

ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved
See the [design doc](https://github.com/opendistro-for-elasticsearch/performance-analyzer-rca/blob/master/docs/batch-metrics-api.md) for the expected response.

ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved
## Documentation

Please refer to the [technical documentation](https://opendistro.github.io/for-elasticsearch-docs/) for detailed information on installing and configuring Performance Analyzer.
Expand Down
1 change: 1 addition & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ ext {
}
test {
enabled = true
include '**/*Test.class'
ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved
}

licenseHeaders.enabled = false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,27 @@ public class PerformanceAnalyzerController {
private static final String PERFORMANCE_ANALYZER_ENABLED_CONF = "performance_analyzer_enabled.conf";
private static final String RCA_ENABLED_CONF = "rca_enabled.conf";
private static final String LOGGING_ENABLED_CONF = "logging_enabled.conf";
private static final String BATCH_METRICS_ENABLED_CONF = "batch_metrics_enabled.conf";
ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved
yojs marked this conversation as resolved.
Show resolved Hide resolved
private static final Logger LOG = LogManager.getLogger(PerformanceAnalyzerController.class);
public static final int DEFAULT_NUM_OF_SHARDS_PER_COLLECTION = 0;

private boolean paEnabled;
private boolean rcaEnabled;
private boolean loggingEnabled;
private boolean batchMetricsEnabled;
private volatile int shardsPerCollection;
private boolean paEnabledDefaultValue = false;
private boolean rcaEnabledDefaultValue = false;
private boolean loggingEnabledDefaultValue = false;
private boolean batchMetricsEnabledDefaultValue = false;
yojs marked this conversation as resolved.
Show resolved Hide resolved
private final ScheduledMetricCollectorsExecutor scheduledMetricCollectorsExecutor;

public PerformanceAnalyzerController(final ScheduledMetricCollectorsExecutor scheduledMetricCollectorsExecutor) {
this.scheduledMetricCollectorsExecutor = scheduledMetricCollectorsExecutor;
initPerformanceAnalyzerStateFromConf();
initRcaStateFromConf();
initLoggingStateFromConf();
initBatchMetricsStateFromConf();
yojs marked this conversation as resolved.
Show resolved Hide resolved
shardsPerCollection = DEFAULT_NUM_OF_SHARDS_PER_COLLECTION;
}

Expand Down Expand Up @@ -70,6 +74,10 @@ public boolean isLoggingEnabled() {
return loggingEnabled;
}

public boolean isBatchMetricsEnabled() {
return batchMetricsEnabled;
}

/**
* Reads the shardsPerCollection parameter in NodeStatsMetric
* @return the count of Shards per Collection
Expand Down Expand Up @@ -131,6 +139,20 @@ public void updateLoggingState(final boolean shouldEnable) {
saveStateToConf(this.loggingEnabled, LOGGING_ENABLED_CONF);
}

/**
* Updates the state of the batch metrics api.
*
* @param shouldEnable The desired state of the batch metrics api. False to disable, and true to enable.
*/
public void updateBatchMetricsState(final boolean shouldEnable) {
if (shouldEnable && !isPerformanceAnalyzerEnabled()) {
return;
sidheart marked this conversation as resolved.
Show resolved Hide resolved
}

this.batchMetricsEnabled = shouldEnable;
saveStateToConf(this.batchMetricsEnabled, BATCH_METRICS_ENABLED_CONF);
}

private void initPerformanceAnalyzerStateFromConf() {
Path filePath = Paths.get(getDataDirectory(), PERFORMANCE_ANALYZER_ENABLED_CONF);
PerformanceAnalyzerPlugin.invokePrivileged(() -> {
Expand Down Expand Up @@ -187,6 +209,25 @@ private void initLoggingStateFromConf() {
});
}

private void initBatchMetricsStateFromConf() {
Path filePath = Paths.get(getDataDirectory(), BATCH_METRICS_ENABLED_CONF);
PerformanceAnalyzerPlugin.invokePrivileged(() -> {
boolean batchMetricsEnabledFromConf;
try {
batchMetricsEnabledFromConf = readBooleanFromFile(filePath);
} catch (Exception e) {
LOG.debug("Error reading Performance Analyzer state from Conf file", e);
if (e instanceof NoSuchFileException) {
yojs marked this conversation as resolved.
Show resolved Hide resolved
saveStateToConf(batchMetricsEnabledDefaultValue, BATCH_METRICS_ENABLED_CONF);
}
batchMetricsEnabledFromConf = batchMetricsEnabledDefaultValue;
}

// For batch metrics to be enabled, it needs both PA and Batch Metrics to be enabled.
updateBatchMetricsState(paEnabled && batchMetricsEnabledFromConf);
});
}

private boolean readBooleanFromFile(final Path filePath) throws Exception {
try (Scanner sc = new Scanner(filePath)) {
String nextLine = sc.nextLine();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ public final class PerformanceAnalyzerClusterSettings {
* Bit 0: Perf Analyzer enabled/disabled
* Bit 1: RCA enabled/disabled
* Bit 2: Logging enabled/disabled
* Bit 3: Batch Metrics enabled/disabled
*/
public static final Setting<Integer> COMPOSITE_PA_SETTING = Setting.intSetting(
"cluster.metadata.perf_analyzer.state",
Expand All @@ -19,7 +20,8 @@ public final class PerformanceAnalyzerClusterSettings {
public enum PerformanceAnalyzerFeatureBits {
PA_BIT,
RCA_BIT,
LOGGING_BIT
LOGGING_BIT,
BATCH_METRICS_BIT
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ public class PerformanceAnalyzerClusterSettingHandler implements ClusterSettingL
private static final int RCA_ENABLED_BIT_POS = PerformanceAnalyzerFeatureBits.RCA_BIT.ordinal();
private static final int PA_ENABLED_BIT_POS = PerformanceAnalyzerFeatureBits.PA_BIT.ordinal();
private static final int LOGGING_ENABLED_BIT_POS = PerformanceAnalyzerFeatureBits.LOGGING_BIT.ordinal();
private static final int BATCH_METRICS_ENABLED_BIT_POS = PerformanceAnalyzerFeatureBits.BATCH_METRICS_BIT.ordinal();

private final PerformanceAnalyzerController controller;
private final ClusterSettingsManager clusterSettingsManager;
Expand All @@ -24,11 +25,11 @@ public PerformanceAnalyzerClusterSettingHandler(final PerformanceAnalyzerControl
final ClusterSettingsManager clusterSettingsManager) {
this.controller = controller;
this.clusterSettingsManager = clusterSettingsManager;
this.currentClusterSetting =
initializeClusterSettingValue(
this.currentClusterSetting = initializeClusterSettingValue(
controller.isPerformanceAnalyzerEnabled(),
controller.isRcaEnabled(),
controller.isLoggingEnabled());
controller.isLoggingEnabled(),
controller.isBatchMetricsEnabled());
}

/**
Expand Down Expand Up @@ -61,6 +62,16 @@ public void updateRcaSetting(final boolean state) {
clusterSettingsManager.updateSetting(COMPOSITE_PA_SETTING, settingIntValue);
}

/**
* Updates the Batch Metrics setting across the cluster.
*
* @param state The desired state for batch metrics.
*/
public void updateBatchMetricsSetting(final boolean state) {
final Integer settingIntValue = getBatchMetricsSettingValueFromState(state);
clusterSettingsManager.updateSetting(COMPOSITE_PA_SETTING, settingIntValue);
}

/**
* Handler that gets called when there is a new value for the setting that this listener
* is listening to.
Expand All @@ -74,6 +85,7 @@ public void onSettingUpdate(final Integer newSettingValue) {
controller.updatePerformanceAnalyzerState(getPAStateFromSetting(newSettingValue));
controller.updateRcaState(getRcaStateFromSetting(newSettingValue));
controller.updateLoggingState(getLoggingStateFromSetting(newSettingValue));
controller.updateBatchMetricsState(getBatchMetricsStateFromSetting(newSettingValue));
}
}

Expand All @@ -95,13 +107,15 @@ public int getCurrentClusterSettingValue() {
* @return the cluster setting value
*/
private Integer initializeClusterSettingValue(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we tested this across 1/ multi-node scenarios, 2/ clusters with nodes on different versions of this code (one node with this change and another on a version before this change) ?

Copy link
Contributor Author

@ricardolstephen ricardolstephen Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran these tests manually. See the test list in opendistro-for-elasticsearch/performance-analyzer-rca#315

final boolean paEnabled, final boolean rcaEnabled, final boolean loggingEnabled) {
final boolean paEnabled, final boolean rcaEnabled, final boolean loggingEnabled,
final boolean batchMetricsEnabled) {
int clusterSetting = CLUSTER_SETTING_DISABLED_VALUE;

clusterSetting = paEnabled ? setBit(clusterSetting, PA_ENABLED_BIT_POS) : clusterSetting;
if (paEnabled) {
clusterSetting = rcaEnabled ? setBit(clusterSetting, RCA_ENABLED_BIT_POS) : clusterSetting;
clusterSetting = loggingEnabled ? setBit(clusterSetting, LOGGING_ENABLED_BIT_POS) : clusterSetting;
clusterSetting = batchMetricsEnabled ? setBit(clusterSetting, BATCH_METRICS_ENABLED_BIT_POS) : clusterSetting;
}
return clusterSetting;
}
Expand All @@ -118,10 +132,10 @@ private boolean getPAStateFromSetting(final int settingValue) {

/**
* Converts the boolean PA state to composite cluster setting.
* If Performance Analyzer is being turned off, it will also turn RCA off.
* If Performance Analyzer is being turned off, it will also turn off RCA, logging, and batch metrics.
*
* @param state the state of performance analyzer. Will enable performance analyzer if true,
* disables both RCA and performance analyzer if false.
* disables performance analyzer, RCA, logging, and batch metrics.
* @return composite cluster setting as an integer.
*/
private Integer getPASettingValueFromState(final boolean state) {
Expand All @@ -130,7 +144,8 @@ private Integer getPASettingValueFromState(final boolean state) {
if (state) {
return setBit(clusterSetting, PA_ENABLED_BIT_POS);
} else {
return resetBit(resetBit(resetBit(clusterSetting, PA_ENABLED_BIT_POS), RCA_ENABLED_BIT_POS), LOGGING_ENABLED_BIT_POS);
return resetBit(resetBit(resetBit(resetBit(clusterSetting, PA_ENABLED_BIT_POS), RCA_ENABLED_BIT_POS),
LOGGING_ENABLED_BIT_POS), BATCH_METRICS_ENABLED_BIT_POS);
}
}

Expand All @@ -154,6 +169,16 @@ private boolean getLoggingStateFromSetting(final int settingValue) {
return ((settingValue >> LOGGING_ENABLED_BIT_POS) & BIT_ONE) == ENABLED_VALUE;
}

/**
* Extracts the boolean value for batch metrics state from the cluster setting.
*
* @param settingValue The composite setting value.
* @return true if the BATCH_METRICS bit is set, false otherwise.
*/
private boolean getBatchMetricsStateFromSetting(final int settingValue) {
return ((settingValue >> BATCH_METRICS_ENABLED_BIT_POS) & BIT_ONE) == ENABLED_VALUE;
}

/**
* Converts the boolean RCA state to composite cluster setting.
* Enables RCA only if performance analyzer is also set. Otherwise, results in a no-op.
Expand Down Expand Up @@ -190,6 +215,24 @@ private Integer getLoggingSettingValueFromState(final boolean shouldEnable) {
}
}

/**
* Converts the boolean batch metrics state to composite cluster setting.
* Enables batch metrics only if performance analyzer is also set. Otherwise, results in a no-op.
*
* @param shouldEnable the state of batch metrics. Will try to enable if true, disables batch metrics if false.
* @return composite cluster setting as an integer.
*/
private Integer getBatchMetricsSettingValueFromState(final boolean shouldEnable) {
int clusterSetting = currentClusterSetting;

if (shouldEnable) {
return checkBit(currentClusterSetting, PA_ENABLED_BIT_POS)
? setBit(clusterSetting, BATCH_METRICS_ENABLED_BIT_POS) : clusterSetting;
} else {
return resetBit(clusterSetting, BATCH_METRICS_ENABLED_BIT_POS);
}
}

/**
* Sets the bit at the specified position.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import java.io.IOException;
import java.util.Map;

import com.amazon.opendistro.elasticsearch.performanceanalyzer.config.PluginSettings;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.config.setting.handler.NodeStatsSettingHandler;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
Expand All @@ -27,10 +28,12 @@ public class PerformanceAnalyzerClusterConfigAction extends BaseRestHandler {
private static final String PA_CLUSTER_CONFIG_PATH = "/_opendistro/_performanceanalyzer/cluster/config";
private static final String RCA_CLUSTER_CONFIG_PATH = "/_opendistro/_performanceanalyzer/rca/cluster/config";
private static final String LOGGING_CLUSTER_CONFIG_PATH = "/_opendistro/_performanceanalyzer/logging/cluster/config";
private static final String BATCH_METRICS_CLUSTER_CONFIG_PATH = "/_opendistro/_performanceanalyzer/batch/cluster/config";
private static final String ENABLED = "enabled";
private static final String SHARDS_PER_COLLECTION = "shardsPerCollection";
private static final String CURRENT = "currentPerformanceAnalyzerClusterState";
private static final String NAME = "PerformanceAnalyzerClusterConfigAction";
private static final String BATCH_METRICS_RETENTION_PERIOD_MINUTES = "batchMetricsRetentionPeriodMinutes";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't adding logic to update this via the API then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this will be a read-only parameter that the cluster owner can modify.


private final PerformanceAnalyzerClusterSettingHandler clusterSettingHandler;
private final NodeStatsSettingHandler nodeStatsSettingHandler;
Expand All @@ -51,6 +54,8 @@ private void registerHandlers(final RestController controller) {
controller.registerHandler(RestRequest.Method.POST, RCA_CLUSTER_CONFIG_PATH, this);
controller.registerHandler(RestRequest.Method.GET, LOGGING_CLUSTER_CONFIG_PATH, this);
controller.registerHandler(RestRequest.Method.POST, LOGGING_CLUSTER_CONFIG_PATH, this);
controller.registerHandler(RestRequest.Method.GET, BATCH_METRICS_CLUSTER_CONFIG_PATH, this);
controller.registerHandler(RestRequest.Method.POST, BATCH_METRICS_CLUSTER_CONFIG_PATH, this);
}

/**
Expand Down Expand Up @@ -89,6 +94,8 @@ protected RestChannelConsumer prepareRequest(final RestRequest request, final No
clusterSettingHandler.updateRcaSetting((Boolean) value);
} else if (request.path().contains(LOGGING_CLUSTER_CONFIG_PATH)) {
clusterSettingHandler.updateLoggingSetting((Boolean) value);
} else if (request.path().contains(BATCH_METRICS_CLUSTER_CONFIG_PATH)) {
clusterSettingHandler.updateBatchMetricsSetting((Boolean) value);
} else {
clusterSettingHandler.updatePerformanceAnalyzerSetting((Boolean) value);
}
Expand All @@ -108,6 +115,7 @@ protected RestChannelConsumer prepareRequest(final RestRequest request, final No
builder.startObject();
builder.field(CURRENT, clusterSettingHandler.getCurrentClusterSettingValue());
builder.field(SHARDS_PER_COLLECTION, nodeStatsSettingHandler.getNodeStatsSetting());
builder.field(BATCH_METRICS_RETENTION_PERIOD_MINUTES, PluginSettings.instance().getBatchMetricsRetentionPeriodMinutes());
builder.endObject();
channel.sendResponse(new BytesRestResponse(RestStatus.OK, builder));
} catch (IOException ioe) {
Expand Down
Loading