Skip to content

Commit

Permalink
Update capacity-planning.adoc (#975)
Browse files Browse the repository at this point in the history
Typo fixes
  • Loading branch information
jgardiner68 authored Jan 29, 2024
1 parent c67b80c commit 4d4ce73
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions docs/modules/ROOT/pages/capacity-planning.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,30 +13,30 @@ The cluster performance depends on multiple factors, including data size,
number of backups, queries, and which features are used. Therefore,
planning the cluster remains a complex task that requires knowledge of
Hazelcast's architecture and concepts. Here, we introduce some basic guidelines
that help to properly size a cluster.
that help to size a cluster properly.

We recommend always benchmarking your setup before deploying it to
production. We also recommend that bechmarking systems resemble the
production. We also recommend that benchmarking systems resemble the
production system as much as possible to avoid unexpected results.
We provide a <<benchmarking-and-sizing-example, bechmarking example>>
We provide a <<benchmarking-and-sizing-example, benchmarking example>>
that you can use as a starting point.

Hazelcast clusters will run both data processing and data storage
workloads, so planning for both types of workload is important.

In order to correctly size the cluster for your use case, answers to as many of the
To correctly size the cluster for your use case, answer as many of the
following questions as possible are necessary:

* How much data you want to keep in the in-memory store at any given time?
* How much data do you want to keep in the in-memory store at any given time?
* How many copies (backups) of that data do you require?
* Are you going to use synchronous
or asynchronous xref:data-structures:backing-up-maps.adoc[backups]?
* When running queries how many indexes or index fields for each object will you have?
* What is your read/write ratio? (Example: 70% of time is spent reading data, 30% is spent writing)
** Based on the read/write ratio and Transactions Per Second (TPS), you can learn about the amount of memory
required to accommodate the data, both existing and new. Usually an eviction mechanism keeps
required to accommodate the data, both existing and new. Usually, an eviction mechanism keeps
the map/cache size in check, but the eviction itself does not always clear the memory almost
immediately. Therefore, the answers to this question gives a good insight.
immediately. Therefore, the answers to this question give a good insight.
* Are you using multiple clusters (which may involve xref:wan:wan.adoc[WAN Replication])?
* What are the throughput and latency requirements?
** The answer should be about Hazelcast access, not the application throughput.
Expand All @@ -45,7 +45,7 @@ transaction may need to use Hazelcast 3 times during the execution. So the
actual Hazelcast throughput would need to be 30,000 TPS. Similarly for latency, the answer
should not be about end-to-end latency but the application-to-Hazelcast latency.
* How many concurrent Hazelcast xref:configuration:jet-configuration.adoc[jobs] will the cluster run?
* What is the approximation duration of a job?
* What is the approximate duration of a job?
* When you use stream processing, what is the average approximation latency for processing of a single event?
* What is the intended xref:pipelines:sources-sinks.adoc[sink] for your jobs (database, dashboard, file system, etc.)?
** If the sink is a Hazelcast map, then the standard caching questions apply, i.e.,
Expand Down Expand Up @@ -106,7 +106,7 @@ the data previously owned by the newly offline member will be redistributed acro
the remaining members. For this reason, we recommend that you plan to use only
60% of available memory, with 40% headroom to handle member failure or shutdown.

If you use High-Density Memory Store, Hazelcast automatically
If you use the High-Density Memory Store, Hazelcast automatically
assigns a percentage of available off-heap memory to the internal
memory manager. Since allocation happens lazily, if you want to
be informed about how much off-heap memory is being used by the
Expand All @@ -131,7 +131,7 @@ instead.
Memory consumption is affected by:

* **Resources deployed with your job:** Attaching big
files such as models for ML inference pipelines can consume significant resources.
files, such as models for ML inference pipelines, can consume significant resources.
* **State of the running jobs:** This varies, as it's affected by the shape of
your pipeline and by the data being processed. Most of the memory is
consumed by operations that aggregate and buffer data. Typically the
Expand Down Expand Up @@ -189,7 +189,7 @@ NOTE: If you are an Enterprise customer using the High-Density Memory Store
with large data sizes, we recommend a large increase in partition count, starting with 5009 or higher.

NOTE: The partition count cannot be easily changed after a cluster is created, so if you have a large cluster
be sure to test and set an optimum partition count prior to deployment. If you need to change th partition
be sure to test and set an optimum partition count prior to deployment. If you need to change the partition
count after a cluster is already running, you will need to schedule a maintenance window to entirely bring
the cluster down. If your cluster uses the xref:storage:persistence.adoc[Persistence] or xref:cp-subsystem:persistence.adoc[CP Persistence]
features, those persistent files will need to be removed after the cluster is shut down, as they contain
Expand Down

0 comments on commit 4d4ce73

Please sign in to comment.