Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Batch Metrics API #159

Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
350cdd2
Add batch metrics api
ricardolstephen Jul 24, 2020
80c492f
Fixup cherry-pick
ricardolstephen Jul 28, 2020
d1c1bea
Cleanup code
ricardolstephen Jul 28, 2020
9aa4a75
Expose batch metrics over the agent api
ricardolstephen Jul 29, 2020
b41197d
Return retention period during config queries
ricardolstephen Aug 3, 2020
a13cd8b
Enable more tests
ricardolstephen Aug 5, 2020
0f90bf4
Add batch metrics tests
ricardolstephen Aug 13, 2020
d69645d
Merge branch 'master' into batch-metrics-api-v3
ricardolstephen Aug 24, 2020
17647bf
Add time unit to the batch-metrics-retention-period variable
ricardolstephen Aug 24, 2020
77f97d0
Add batch metrics api documentation to README
ricardolstephen Aug 24, 2020
610b25b
Update batch metrics api documentation
ricardolstephen Aug 24, 2020
ef67704
Add time unit to batch metrics retention period
ricardolstephen Aug 25, 2020
acc0fa3
Add documentation about samplingperiod
ricardolstephen Sep 2, 2020
4c994a4
Update batch metrics section of README
ricardolstephen Sep 3, 2020
a5a8235
Update batch metrics section of README
ricardolstephen Sep 4, 2020
907bbc7
Update batch metrics api sample query
ricardolstephen Sep 4, 2020
238d203
Merge branch 'master' into batch-metrics-api-v3
ricardolstephen Sep 4, 2020
d59e8fa
Add minor change to README
ricardolstephen Sep 4, 2020
81b5c1e
Note why max datapoints was capped
ricardolstephen Sep 8, 2020
a4e0235
Make default enable values private static final
ricardolstephen Sep 9, 2020
aeee5a1
Update batch metrics docs
ricardolstephen Sep 9, 2020
dbce9ab
Minor changes in readme and error handling
ricardolstephen Sep 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,34 @@ Then you provide parameters for metrics, aggregations, dimensions, and nodes (op
GET `_opendistro/_performanceanalyzer/metrics?metrics=Latency,CPU_Utilization&agg=avg,max&dim=ShardID&nodes=all`


## Batch Metrics API
While the basic metrics api associated with performance analyzer provies the last 5 seconds worth of metrics, the batch metrics api provides more detailed metrics and from longer periods of time. See the [design doc](https://github.com/opendistro-for-elasticsearch/performance-analyzer-rca/blob/master/docs/batch-metrics-api.md) for more information.

The Batch Metrics API uses a single HTTP method and URI for all requests:

GET `<endpoint>/_opendistro/_performanceanalyzer/batch`

Then you provide parameters for metrics, starttime, endtime, and samplingperiod (optional):

```
?metrics=<metrics>&starttime=<starttime>&endtime=<endtime>&samplingperiod=5"
```

* metrics - comma separated list of metrics you are interested in. For a full list of metrics, see Metrics Reference.
* starttime - Unix timestamp (difference between the current time and midnight, January 1, 1970 UTC) determining the oldest data point to return. starttime is inclusive — data points from at or after the starttime will be returned. Note, the starttime and endtime supplied by the user with both be rounded down to the nearest samplingperiod.
* endtime - Unix timestamp determining the freshest data point to return. endtime is exclusive — only datapoints from before the endtime will be returned.
ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved
* samplingperiod - The sampling period in seconds. Must be no less than 5, must be less than the retention period, and must be a multiple of 5. The default is 5s.

ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved
Note, the maximum number of datapoints that a single query can request for via API is capped at 100,800 datapoints. If a query exceeds this limit, an error is returned. The query parameters can be adjusted on such queries to request for fewer datapoints at a time.

The retention period for batch metrics can be adjusted by setting batch-metrics-retention-period-minutes in performance-analyzer.properties. The default value is 7, and values can range from 1 to 60 (inclusive).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be explicit that a really high value of the batch-metrics-retention-period-minutes will lead to heavy disk consumption. We should also have a max this can be set to, to avoid mistakes of setting it too high and then a read through the rest API crippling the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a bounding range (1-60 inclusive). Will add the note about the disk consumption.


yojs marked this conversation as resolved.
Show resolved Hide resolved
Note, the default retention period is 7 minutes because a typical use-case would be to query for 5 minutes worth of data from the node. In order to do this, a client would actually select a starttime of now-6min and an endtime of now-1min (this one minute offset will give sufficient time for the metrics in the time range to be available at the node). Atop this 6 minutes of retention, we need an extra 1 minute of retention to account for the time that would have passed by the time the query arrives at the node, and for the fact that starttime and endtime will be rounded down to the nearest sampling-period.

### SAMPLE REQUEST
GET `_opendistro/_performanceanalyzer/batch?metrics=CPU_Utilization,IO_TotThroughput&starttime=1594412650000&endtime=1594412665000&samplingperiod=5`

ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved

ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved
## Documentation

Please refer to the [technical documentation](https://opendistro.github.io/for-elasticsearch-docs/) for detailed information on installing and configuring Performance Analyzer.
Expand Down
1 change: 1 addition & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ ext {
}
test {
enabled = true
include '**/*Test.class'
ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved
}

licenseHeaders.enabled = false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,27 @@ public class PerformanceAnalyzerController {
private static final String PERFORMANCE_ANALYZER_ENABLED_CONF = "performance_analyzer_enabled.conf";
private static final String RCA_ENABLED_CONF = "rca_enabled.conf";
private static final String LOGGING_ENABLED_CONF = "logging_enabled.conf";
private static final String BATCH_METRICS_ENABLED_CONF = "batch_metrics_enabled.conf";
ricardolstephen marked this conversation as resolved.
Show resolved Hide resolved
yojs marked this conversation as resolved.
Show resolved Hide resolved
private static final Logger LOG = LogManager.getLogger(PerformanceAnalyzerController.class);
public static final int DEFAULT_NUM_OF_SHARDS_PER_COLLECTION = 0;

private boolean paEnabled;
private boolean rcaEnabled;
private boolean loggingEnabled;
private boolean batchMetricsEnabled;
private volatile int shardsPerCollection;
private boolean paEnabledDefaultValue = false;
private boolean rcaEnabledDefaultValue = false;
private boolean loggingEnabledDefaultValue = false;
private boolean batchMetricsEnabledDefaultValue = false;
yojs marked this conversation as resolved.
Show resolved Hide resolved
private final ScheduledMetricCollectorsExecutor scheduledMetricCollectorsExecutor;

public PerformanceAnalyzerController(final ScheduledMetricCollectorsExecutor scheduledMetricCollectorsExecutor) {
this.scheduledMetricCollectorsExecutor = scheduledMetricCollectorsExecutor;
initPerformanceAnalyzerStateFromConf();
initRcaStateFromConf();
initLoggingStateFromConf();
initBatchMetricsStateFromConf();
yojs marked this conversation as resolved.
Show resolved Hide resolved
shardsPerCollection = DEFAULT_NUM_OF_SHARDS_PER_COLLECTION;
}

Expand Down Expand Up @@ -70,6 +74,10 @@ public boolean isLoggingEnabled() {
return loggingEnabled;
}

public boolean isBatchMetricsEnabled() {
return batchMetricsEnabled;
}

/**
* Reads the shardsPerCollection parameter in NodeStatsMetric
* @return the count of Shards per Collection
Expand Down Expand Up @@ -131,6 +139,20 @@ public void updateLoggingState(final boolean shouldEnable) {
saveStateToConf(this.loggingEnabled, LOGGING_ENABLED_CONF);
}

/**
* Updates the state of the batch metrics api.
*
* @param shouldEnable The desired state of the batch metrics api. False to disable, and true to enable.
*/
public void updateBatchMetricsState(final boolean shouldEnable) {
if (shouldEnable && !isPerformanceAnalyzerEnabled()) {
return;
sidheart marked this conversation as resolved.
Show resolved Hide resolved
}

this.batchMetricsEnabled = shouldEnable;
saveStateToConf(this.batchMetricsEnabled, BATCH_METRICS_ENABLED_CONF);
}

private void initPerformanceAnalyzerStateFromConf() {
Path filePath = Paths.get(getDataDirectory(), PERFORMANCE_ANALYZER_ENABLED_CONF);
PerformanceAnalyzerPlugin.invokePrivileged(() -> {
Expand Down Expand Up @@ -187,6 +209,25 @@ private void initLoggingStateFromConf() {
});
}

private void initBatchMetricsStateFromConf() {
Path filePath = Paths.get(getDataDirectory(), BATCH_METRICS_ENABLED_CONF);
PerformanceAnalyzerPlugin.invokePrivileged(() -> {
boolean batchMetricsEnabledFromConf;
try {
batchMetricsEnabledFromConf = readBooleanFromFile(filePath);
} catch (Exception e) {
LOG.debug("Error reading Performance Analyzer state from Conf file", e);
if (e instanceof NoSuchFileException) {
yojs marked this conversation as resolved.
Show resolved Hide resolved
saveStateToConf(batchMetricsEnabledDefaultValue, BATCH_METRICS_ENABLED_CONF);
}
batchMetricsEnabledFromConf = batchMetricsEnabledDefaultValue;
}

// For batch metrics to be enabled, it needs both PA and Batch Metrics to be enabled.
updateBatchMetricsState(paEnabled && batchMetricsEnabledFromConf);
});
}

private boolean readBooleanFromFile(final Path filePath) throws Exception {
try (Scanner sc = new Scanner(filePath)) {
String nextLine = sc.nextLine();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ public final class PerformanceAnalyzerClusterSettings {
* Bit 0: Perf Analyzer enabled/disabled
* Bit 1: RCA enabled/disabled
* Bit 2: Logging enabled/disabled
* Bit 3: Batch Metrics enabled/disabled
*/
public static final Setting<Integer> COMPOSITE_PA_SETTING = Setting.intSetting(
"cluster.metadata.perf_analyzer.state",
Expand All @@ -19,7 +20,8 @@ public final class PerformanceAnalyzerClusterSettings {
public enum PerformanceAnalyzerFeatureBits {
PA_BIT,
RCA_BIT,
LOGGING_BIT
LOGGING_BIT,
BATCH_METRICS_BIT
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ public class PerformanceAnalyzerClusterSettingHandler implements ClusterSettingL
private static final int RCA_ENABLED_BIT_POS = PerformanceAnalyzerFeatureBits.RCA_BIT.ordinal();
private static final int PA_ENABLED_BIT_POS = PerformanceAnalyzerFeatureBits.PA_BIT.ordinal();
private static final int LOGGING_ENABLED_BIT_POS = PerformanceAnalyzerFeatureBits.LOGGING_BIT.ordinal();
private static final int BATCH_METRICS_ENABLED_BIT_POS = PerformanceAnalyzerFeatureBits.BATCH_METRICS_BIT.ordinal();
private static final int MAX_ALLOWED_BIT_POS = Math.min(PerformanceAnalyzerFeatureBits.values().length, Integer.SIZE - 1);

private final PerformanceAnalyzerController controller;
Expand All @@ -24,11 +25,11 @@ public PerformanceAnalyzerClusterSettingHandler(final PerformanceAnalyzerControl
final ClusterSettingsManager clusterSettingsManager) {
this.controller = controller;
this.clusterSettingsManager = clusterSettingsManager;
this.currentClusterSetting =
initializeClusterSettingValue(
this.currentClusterSetting = initializeClusterSettingValue(
controller.isPerformanceAnalyzerEnabled(),
controller.isRcaEnabled(),
controller.isLoggingEnabled());
controller.isLoggingEnabled(),
controller.isBatchMetricsEnabled());
}

/**
Expand Down Expand Up @@ -61,6 +62,16 @@ public void updateRcaSetting(final boolean state) {
clusterSettingsManager.updateSetting(COMPOSITE_PA_SETTING, settingIntValue);
}

/**
* Updates the Batch Metrics setting across the cluster.
*
* @param state The desired state for batch metrics.
*/
public void updateBatchMetricsSetting(final boolean state) {
final Integer settingIntValue = getBatchMetricsSettingValueFromState(state);
clusterSettingsManager.updateSetting(COMPOSITE_PA_SETTING, settingIntValue);
}

/**
* Handler that gets called when there is a new value for the setting that this listener
* is listening to.
Expand All @@ -74,6 +85,7 @@ public void onSettingUpdate(final Integer newSettingValue) {
controller.updatePerformanceAnalyzerState(getPAStateFromSetting(newSettingValue));
controller.updateRcaState(getRcaStateFromSetting(newSettingValue));
controller.updateLoggingState(getLoggingStateFromSetting(newSettingValue));
controller.updateBatchMetricsState(getBatchMetricsStateFromSetting(newSettingValue));
}
}

Expand All @@ -95,13 +107,15 @@ public int getCurrentClusterSettingValue() {
* @return the cluster setting value
*/
private Integer initializeClusterSettingValue(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we tested this across 1/ multi-node scenarios, 2/ clusters with nodes on different versions of this code (one node with this change and another on a version before this change) ?

Copy link
Contributor Author

@ricardolstephen ricardolstephen Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran these tests manually. See the test list in opendistro-for-elasticsearch/performance-analyzer-rca#315

final boolean paEnabled, final boolean rcaEnabled, final boolean loggingEnabled) {
final boolean paEnabled, final boolean rcaEnabled, final boolean loggingEnabled,
final boolean batchMetricsEnabled) {
int clusterSetting = CLUSTER_SETTING_DISABLED_VALUE;

clusterSetting = paEnabled ? setBit(clusterSetting, PA_ENABLED_BIT_POS) : clusterSetting;
if (paEnabled) {
clusterSetting = rcaEnabled ? setBit(clusterSetting, RCA_ENABLED_BIT_POS) : clusterSetting;
clusterSetting = loggingEnabled ? setBit(clusterSetting, LOGGING_ENABLED_BIT_POS) : clusterSetting;
clusterSetting = batchMetricsEnabled ? setBit(clusterSetting, BATCH_METRICS_ENABLED_BIT_POS) : clusterSetting;
}
return clusterSetting;
}
Expand All @@ -118,10 +132,10 @@ private boolean getPAStateFromSetting(final int settingValue) {

/**
* Converts the boolean PA state to composite cluster setting.
* If Performance Analyzer is being turned off, it will also turn RCA off.
* If Performance Analyzer is being turned off, it will also turn off RCA, logging, and batch metrics.
*
* @param state the state of performance analyzer. Will enable performance analyzer if true,
* disables both RCA and performance analyzer if false.
* disables performance analyzer, RCA, logging, and batch metrics.
* @return composite cluster setting as an integer.
*/
private Integer getPASettingValueFromState(final boolean state) {
Expand All @@ -130,7 +144,8 @@ private Integer getPASettingValueFromState(final boolean state) {
if (state) {
return setBit(clusterSetting, PA_ENABLED_BIT_POS);
} else {
return resetBit(resetBit(resetBit(clusterSetting, PA_ENABLED_BIT_POS), RCA_ENABLED_BIT_POS), LOGGING_ENABLED_BIT_POS);
return resetBit(resetBit(resetBit(resetBit(clusterSetting, PA_ENABLED_BIT_POS), RCA_ENABLED_BIT_POS),
LOGGING_ENABLED_BIT_POS), BATCH_METRICS_ENABLED_BIT_POS);
}
}

Expand All @@ -154,6 +169,16 @@ private boolean getLoggingStateFromSetting(final int settingValue) {
return ((settingValue >> LOGGING_ENABLED_BIT_POS) & BIT_ONE) == ENABLED_VALUE;
}

/**
* Extracts the boolean value for batch metrics state from the cluster setting.
*
* @param settingValue The composite setting value.
* @return true if the BATCH_METRICS bit is set, false otherwise.
*/
private boolean getBatchMetricsStateFromSetting(final int settingValue) {
return ((settingValue >> BATCH_METRICS_ENABLED_BIT_POS) & BIT_ONE) == ENABLED_VALUE;
}

/**
* Converts the boolean RCA state to composite cluster setting.
* Enables RCA only if performance analyzer is also set. Otherwise, results in a no-op.
Expand Down Expand Up @@ -190,6 +215,24 @@ private Integer getLoggingSettingValueFromState(final boolean shouldEnable) {
}
}

/**
* Converts the boolean batch metrics state to composite cluster setting.
* Enables batch metrics only if performance analyzer is also set. Otherwise, results in a no-op.
*
* @param shouldEnable the state of batch metrics. Will try to enable if true, disables batch metrics if false.
* @return composite cluster setting as an integer.
*/
private Integer getBatchMetricsSettingValueFromState(final boolean shouldEnable) {
int clusterSetting = currentClusterSetting;

if (shouldEnable) {
return checkBit(currentClusterSetting, PA_ENABLED_BIT_POS)
? setBit(clusterSetting, BATCH_METRICS_ENABLED_BIT_POS) : clusterSetting;
} else {
return resetBit(clusterSetting, BATCH_METRICS_ENABLED_BIT_POS);
}
}

/**
* Sets the bit at the specified position.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import java.io.IOException;
import java.util.Map;

import com.amazon.opendistro.elasticsearch.performanceanalyzer.config.PluginSettings;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.config.setting.handler.NodeStatsSettingHandler;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
Expand All @@ -27,10 +28,12 @@ public class PerformanceAnalyzerClusterConfigAction extends BaseRestHandler {
private static final String PA_CLUSTER_CONFIG_PATH = "/_opendistro/_performanceanalyzer/cluster/config";
private static final String RCA_CLUSTER_CONFIG_PATH = "/_opendistro/_performanceanalyzer/rca/cluster/config";
private static final String LOGGING_CLUSTER_CONFIG_PATH = "/_opendistro/_performanceanalyzer/logging/cluster/config";
private static final String BATCH_METRICS_CLUSTER_CONFIG_PATH = "/_opendistro/_performanceanalyzer/batch/cluster/config";
private static final String ENABLED = "enabled";
private static final String SHARDS_PER_COLLECTION = "shardsPerCollection";
private static final String CURRENT = "currentPerformanceAnalyzerClusterState";
private static final String NAME = "PerformanceAnalyzerClusterConfigAction";
private static final String BATCH_METRICS_RETENTION_PERIOD_MINUTES = "batchMetricsRetentionPeriodMinutes";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't adding logic to update this via the API then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this will be a read-only parameter that the cluster owner can modify.


private final PerformanceAnalyzerClusterSettingHandler clusterSettingHandler;
private final NodeStatsSettingHandler nodeStatsSettingHandler;
Expand All @@ -51,6 +54,8 @@ private void registerHandlers(final RestController controller) {
controller.registerHandler(RestRequest.Method.POST, RCA_CLUSTER_CONFIG_PATH, this);
controller.registerHandler(RestRequest.Method.GET, LOGGING_CLUSTER_CONFIG_PATH, this);
controller.registerHandler(RestRequest.Method.POST, LOGGING_CLUSTER_CONFIG_PATH, this);
controller.registerHandler(RestRequest.Method.GET, BATCH_METRICS_CLUSTER_CONFIG_PATH, this);
controller.registerHandler(RestRequest.Method.POST, BATCH_METRICS_CLUSTER_CONFIG_PATH, this);
}

/**
Expand Down Expand Up @@ -89,6 +94,8 @@ protected RestChannelConsumer prepareRequest(final RestRequest request, final No
clusterSettingHandler.updateRcaSetting((Boolean) value);
} else if (request.path().contains(LOGGING_CLUSTER_CONFIG_PATH)) {
clusterSettingHandler.updateLoggingSetting((Boolean) value);
} else if (request.path().contains(BATCH_METRICS_CLUSTER_CONFIG_PATH)) {
clusterSettingHandler.updateBatchMetricsSetting((Boolean) value);
} else {
clusterSettingHandler.updatePerformanceAnalyzerSetting((Boolean) value);
}
Expand All @@ -108,6 +115,7 @@ protected RestChannelConsumer prepareRequest(final RestRequest request, final No
builder.startObject();
builder.field(CURRENT, clusterSettingHandler.getCurrentClusterSettingValue());
builder.field(SHARDS_PER_COLLECTION, nodeStatsSettingHandler.getNodeStatsSetting());
builder.field(BATCH_METRICS_RETENTION_PERIOD_MINUTES, PluginSettings.instance().getBatchMetricsRetentionPeriodMinutes());
builder.endObject();
channel.sendResponse(new BytesRestResponse(RestStatus.OK, builder));
} catch (IOException ioe) {
Expand Down
Loading