From ab2181d842771413b4efcd44cada04a02b071e38 Mon Sep 17 00:00:00 2001 From: Olu Ashiru <149683888+oluashiruHO@users.noreply.github.com> Date: Wed, 29 Nov 2023 14:07:14 +0000 Subject: [PATCH] Update docs/standards/service-reliability.md Co-authored-by: Aaron Russell <128606235+aaronrussellHO@users.noreply.github.com> --- docs/standards/service-reliability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standards/service-reliability.md b/docs/standards/service-reliability.md index cb0db411..691ff936 100644 --- a/docs/standards/service-reliability.md +++ b/docs/standards/service-reliability.md @@ -78,7 +78,7 @@ Furthermore, implement a backoff strategy if your transactions allow it. ### Service MUST be sized appropriately for normal operations -The application / service must be sized appropriately in terms of system resources (CPU, memory, storage, et al) for normal operating conditions preventing unnecessary resource wastage. +The application / service must be sized appropriately in terms of system resources (CPU, memory, storage, etc.) for normal operating conditions preventing unnecessary resource wastage. Considering the acceptable time it takes for your application to recover (Mean Time To Recover / MTTR), additional headroom is allowed in the event of failures - for example, it is permissible if you are running a two node cluster that normal operations yield 50% resource utilisation, so that if a single node is lost your service may continue to function on a single node at 100% utilisation until the second node is/can be recovered. - Check that Kubernetes services have configured CPU and Memory requests and limits are configured