Skip to content

Commit

Permalink
Improvements on various metrics ... (#939)
Browse files Browse the repository at this point in the history
* Improvements on various metrics ...

based on https://hazelcast.lightning.force.com/lightning/r/Case/5006e000024vUboAAE/view

* Fixing typo
  • Loading branch information
Serdaro authored Dec 22, 2023
1 parent 288cde7 commit 5e80169
Showing 1 changed file with 25 additions and 4 deletions.
29 changes: 25 additions & 4 deletions docs/modules/ROOT/pages/list-of-metrics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -273,15 +273,15 @@ the watermark already having passed their windows.

|`map.getCount`
|count
|Number of local get operations on the map
|Number of local get operations on the map; it is incremented for every get operation even the entries do not exist.

|`map.heapCost`
|count
|Total heap cost for the map on this member

|`map.hits`
|count
|Number of reads of the locally owned entries
|Number of reads of the locally owned entries; it is incremented for every read by any type of operation (get, set, put). So, the entries should exists.

|`map.indexedQueryCount`
|count
Expand Down Expand Up @@ -367,6 +367,18 @@ the watermark already having passed their windows.
|ms
|Total latency of local set operations on the map

3+a|
The above `*latency` metrics are only measured for the members and they are not representing the overall performance of the cluster.
We recommend monitoring the average latency for each operation, for example, `map.totalGetLatency` / `map.getCount` and `map.totalSetLatency` / `map.setCount`.
Increased average latency is a sign that the cluster would experience performance problems, or there is a spike in the load.
The following may be the reasons:

* Increase in the load on the cluster: If the cluster is under heavy load, this can lead to increased latency for all operations, slowing down the overall performance.
* Increasing member count in the cluster: As the number of cluster members increases, the total latency for operations can also increase.
This is because the cluster has to communicate with more members, which can add to the overall latency. This might be a data architecture problem.
* Increasing the data set size: This causes the cluster to search through more data to find the requested data, which can slow down the overall performance. Creating indexes may solve these kind of problems.
* Increasing the number of concurrent operations: This causes the cluster to process more requests at the same time, which can slow down the overall performance. This is a potential bottleneck on resources (CPU, memory, network).
|`map.index.averageHitLatency`
|ns
|Average hit latency for the index on this member
Expand Down Expand Up @@ -1419,6 +1431,10 @@ the watermark already having passed their windows.
.Operations
[%collapsible]
====

NOTE: Within Hazelcast context, the **priority** operations are the ones that are important for the stability of cluster, for example heartbeats and migration requests.
The **normal** operations are the ones that manipulate the data, for example `map.get` and `map.put`.

[cols="4,1,6a"]
|===
| Name
Expand Down Expand Up @@ -1527,11 +1543,16 @@ the watermark already having passed their windows.

|`operation.queueSize`
|count
|Number of normal operations pending (normal partition ops. + normal generic ops.)
|Number of normal operations pending (normal partition operations + normal generic operations).

It refers to the number of operations sent to the member that have yet to be consumed for processing by the partition operation threads.
This is the most critical queue for partition aware operations such as `map.put` and `map.remove`.
This value should be zero or very close to zero.
Based on your latency tolerance in your business use case, you can define a threshold for alerts with your preferred alerting mechanism. For instance, triggering an alert if this value is above 100 for 15 seconds would be useful.

|`operation.responseQueueSize`
|count
|Total number of pending responses to be processed
|Total number of pending responses (work queue for the response threads) to be processed.

|`operation.responses.backupCount`
|count
Expand Down

0 comments on commit 5e80169

Please sign in to comment.