diff --git a/docs/images/bucket_location.png b/docs/images/bucket_location.png new file mode 100644 index 000000000..363214754 Binary files /dev/null and b/docs/images/bucket_location.png differ diff --git a/docs/images/bucket_metrics.png b/docs/images/bucket_metrics.png new file mode 100644 index 000000000..0628fc8c7 Binary files /dev/null and b/docs/images/bucket_metrics.png differ diff --git a/docs/images/cpu_usage.png b/docs/images/cpu_usage.png new file mode 100644 index 000000000..98dddbc71 Binary files /dev/null and b/docs/images/cpu_usage.png differ diff --git a/docs/images/memory_usage.png b/docs/images/memory_usage.png new file mode 100644 index 000000000..3aaed988a Binary files /dev/null and b/docs/images/memory_usage.png differ diff --git a/docs/monitoring.md b/docs/monitoring.md new file mode 100644 index 000000000..71d303387 --- /dev/null +++ b/docs/monitoring.md @@ -0,0 +1,62 @@ + + +# Monitoring + +## Sidecar container resource usage + +Cloud Storage FUSE instances run inside sidecar containers and mount Cloud Storage buckets for your workload. To ensure the Cloud Storage FUSE instances run properly, it is important to monitor the sidecar container resource consumption. To learn more about how to configure the sidecar container resource allocation, see GKE documentation [Configure resources for the sidecar container](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#sidecar-container-resources). + +You can use [GCP Metrics Explorer](https://cloud.google.com/monitoring/charts/metrics-explorer) to check the sidecar container resource usage. Use the following filters: + +### Memory usage + +Insufficient memory will cause Cloud Storage FUSE out-of-memory errors and crash the workload application. Ensure the sidecar container memory limit is large enough, or leave the memory limit unset to allow the Cloud Storage FUSE to consume all the available resources on a node. + +- Metric: Kubernetes Container - Memory usage (kubernetes.io/container/memory/used_bytes) + +- Filter: + - memory_type = non-evictable + - container_name = gke-gcsfuse-sidecar + - pod_name = your-pod-name +- For example: ![example of memory usage](./images/memory_usage.png) + +### CPU usage time + +Insufficient CPU will cause Cloud Storage FUSE throttling and lead to unsatisfying performance. Ensure the sidecar container CPU limit is large enough, or leave the CPU limit unset to allow the Cloud Storage FUSE to consume all the available resources on a node. + +- Metric: Kubernetes Container - CPU usage time (kubernetes.io/container/cpu/core_usage_time) +- Filter: + - container_name = gke-gcsfuse-sidecar + - pod_name = your-pod-name +- For example: ![example of CPU usage](./images/cpu_usage.png) + +## Cloud Storage bucket observability + +To check metrics of Cloud Storage buckets, go to the bucket page, and click the `OBSERVABILITY` tab. For example: ![example of bucket metrics](./images/bucket_metrics.png) + +### Total read/list/get request count + +This chart shows total requests issued by the Cloud Storage FUSE for Read, List, Get. If the `GetObjectMetadata` request is observed throughout your workload, consider enabling the Cloud Storage FUSE metadata cache and increasing the cache capacity. For more information, refer to the [troubleshooting guide](./troubleshooting.md#metadata-cache). + +### Data egress rate over the network + +This chart presents an approximate representation of the object download speed from Cloud Storage FUSE. If the throughput is inadequate, you can refer to the [performance troubleshooting steps](./troubleshooting.md#performance-issues) for guidance on tuning Cloud Storage FUSE to improve its performance. + +## Cloud Storage FUSE metrics + +Cloud Storage FUSE supports exporting [custom metrics](https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/metrics.md) to Google cloud monitoring. Currently, these metrics are not available on GKE. GKE is working on integrating these metrics with the CSI driver. diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index d5f928fed..0e59b2efc 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -19,13 +19,14 @@ limitations under the License. ## Log queries -Run the following queries on GCP Logs Explorer to check logs. +Run the following queries on [GCP Logs Explorer](https://cloud.google.com/logging/docs/view/logs-explorer-interface) to check logs. - Sidecar container and gcsfuse logs: ```text resource.type="k8s_container" resource.labels.container_name="gke-gcsfuse-sidecar" + resource.labels.pod_name=your-pod-name" ``` - Cloud Storage FUSE CSI Driver logs: @@ -42,23 +43,27 @@ Run the following queries on GCP Logs Explorer to check logs. resource.labels.container_name="gcs-fuse-csi-driver-webhook" ``` +## New features availability + +To use the Cloud Storage FUSE CSI driver and specific feature or enhancement, your clusters must meet the specific requirements. See the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#requirements) for these requirements. + ## I/O errors in your workloads - Error `Transport endpoint is not connected` in workload Pods. - This error is due to Cloud Storage FUSE termination. In most cases, Cloud Storage FUSE was terminated because of OOM. Please use the Pod annotations `gke-gcsfuse/[cpu-limit|memory-limit|ephemeral-storage-limit]` to allocate more resources to Cloud Storage FUSE (the sidecar container). Note that the only way to fix this error is to restart your workload Pod. + This error is due to Cloud Storage FUSE termination. In most cases, Cloud Storage FUSE was terminated because of OOM. Use the Pod annotations `gke-gcsfuse/[cpu-limit|memory-limit|ephemeral-storage-limit]` to allocate more resources to Cloud Storage FUSE (the sidecar container). Note that the only way to fix this error is to restart your workload Pod. - Error `Permission denied` in workload Pods. Cloud Storage FUSE does not have permission to access the file system. - Please double check your container user and fsGroup. Make sure you pass `uid` and `gid` flags correctly. See [Configure how Cloud Storage FUSE buckets are mounted](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#mounting-flags) for more details. + Double check your container `user` and `fsGroup`. If you use a [Security Context](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) for your Pod or container, or if your container image uses a non-root user or group, you must set the `uid` and `gid` mount flags. You also need to use the `file-mode` and `dir-mode` mount flags to set the file system permissions. See [Configure how Cloud Storage FUSE buckets are mounted](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#considerations) for more details. - Please double check your service account setup. See [Configure access to Cloud Storage buckets using GKE Workload Identity](./authentication.md) for more details. + Double check your service account setup. See [Configure access to Cloud Storage buckets using GKE Workload Identity](./authentication.md) for more details. ## Pod event warnings -If your workload Pods cannot start up, please run `kubectl describe pod -n ` to check the Pod events. Find the troubleshooting guide below according to the Pod event. +If your workload Pods cannot start up, run `kubectl describe pod -n ` to check the Pod events. Find the troubleshooting guide below according to the Pod event. ### CSI driver enablement issues @@ -70,11 +75,11 @@ If your workload Pods cannot start up, please run `kubectl describe pod Note: the rpc error code can be used to triage `MountVolume.SetUp` issues. For example, `Unauthenticated` and `PermissionDenied` usually mean the authentication was not configured correctly. A rpc error code `Internal` means that unexpected issues occurred in the CSI driver, please create a [new issue](https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/issues/new) on the GitHub project page. +> Note: the rpc error code can be used to triage `MountVolume.SetUp` issues. For example, `Unauthenticated` and `PermissionDenied` usually mean the authentication was not configured correctly. A rpc error code `Internal` means that unexpected issues occurred in the CSI driver, create a [new issue](https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/issues/new) on the GitHub project page. #### Unauthenticated @@ -84,7 +89,7 @@ If your workload Pods cannot start up, please run `kubectl describe pod Note: the file cache feautre requires these GKE versions: 1.25.16-gke.1759000, 1.26.15-gke.1158000, 1.27.12-gke.1190000, 1.28.8-gke.1175000, 1.29.3-gke.1093000 **or later**. + +See [File cache](#file-cache) for more details about this feature. + +#### Volume mount failure with custom caching volumes + +- Pod event warning examples: + + - > MountVolume.SetUp failed for volume "gcs-fuse-csi-ephemeral" : rpc error: code = Internal desc = the sidecar container failed with error: gcsfuse exited with error: exit status 1 + +- Error from the sidecar container `gke-gcsfuse-sidecar`: + + - > Panic: createFileCacheHandler: error while creating file cache directory: error in creating directory structure /gcsfuse-cache/.volumes/volume-name/gcsfuse-file-cache: mkdir /gcsfuse-cache/.volumes: permission denied + +- Solutions: + + This warning indicates that the custom buffering or caching volumes are specified, but the Pod `securityContext` `fsGroup` is not specified. Thus, gcsfuse does not have permission to access the buffering or caching volumes. + + Specify a `securityContext` `fsGroup` on the Pod spec and restart the workload. The `fsGroup` can be an arbitrary ID. See GKE documentation [Configure a custom read cache volume for the sidecar container](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#cache-volume) for details. + +#### File cache does not improve the performance + +Use the following considerations when troubleshooting file cache performance issues: + +- File cache is particularly effective in enhancing read operations for small files less than 3 MiB. +- Make sure the file cache feature is enabled using the volume attribute `fileCacheCapacity`. +- Make sure the [metadata cache](#metadata-cache) is also enabled. +- Make sure the underlying storage volume for file cache is larger than the volume attribute `fileCacheCapacity`. +- Make sure volume attribute `fileCacheCapacity` is larger than the total file size. + +##### No space left on device for file cache + +- Messages from the sidecar container `gke-gcsfuse-sidecar`: + + - > Job:xxx (bucket-name:/file-name) failed with: downloadObjectAsync: error at the time of copying content to cache file write /gcsfuse-cache/.volumes/volume-name/gcsfuse-file-cache/bucket-name/file-name: no space left on device + +- Solutions: + + Make sure the underlying storage volume for file cache is larger than the volume attribute `fileCacheCapacity`. + + - If the underlaying volume storage is an `emptyDir` backed by the boot disk, or Local SSD, increase the sidecar container ephemeral storage using the Pod annotation `gke-gcsfuse/ephemeral-storage-limit`. + - If the underlaying volume storage is an `emptyDir` backed by memory, increase the sidecar container memory using the Pod annotation `gke-gcsfuse/ephemeral-storage-memory`. + - If the underlaying volume storage is a `PVC`, make sure the PVC `spec.resources.requests.storage` is large enough. + +##### Cache size of the entry is more than the cache's maxSize + +- Messages from the sidecar container `gke-gcsfuse-sidecar`: + + - > tryReadingFromFileCache: while creating CacheHandle: GetCacheHandle: while adding the entry in the cache: addFileInfoEntryAndCreateDownloadJob: while inserting into the cache: size of the entry is more than the cache's maxSize + +- Solutions: + + Increase the volume attribute `fileCacheCapacity` value to make sure it is larger than the total file size. + +## Performance issues + +This section aims to provide troubleshooting steps and tips to resolve Cloud Storage FUSE CSI driver performance issues. + +Since the GKE CSI driver consumes Cloud Storage FUSE in sidecar containers, read [Cloud Storage FUSE performance and best practices](https://cloud.google.com/storage/docs/gcsfuse-performance-and-best-practices) before continuing. You can configure the [Cloud Storage FUSE mount flags](https://cloud.google.com/storage/docs/gcsfuse-cli#options) using [GKE Cloud Storage FUSE CSI driver mount options](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#mount-options). The [Cloud Storage FUSE configuration file](https://cloud.google.com/storage/docs/gcsfuse-config-file) is configurable via [GKE Cloud Storage FUSE CSI driver volume attributes](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#volume-attributes). + +### Sidecar container resource allocation + +In most cases, unsatisfactory performance is caused by insufficient CPU or memory allocated to the Cloud Storage FUSE sidecar container. You can follow the steps below to properly allocate resources. + +- Read through the considerations highlighted in the GEK documentation: [Configure resources for the sidecar container](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#sidecar-container-resources). You will learn about why you may need to increase the resource allocation, and how to configure the sidecar container resource allocation using Pod annotations. + +- If you set a sidecar container CPU or memory limit using the Pod annotation, such as `gke-gcsfuse/cpu-limit: "5"` or `gke-gcsfuse/memory-limit: "5Gi"`, follow the [monitoring guidance](./monitoring.md#sidecar-container-resource-usage) to check if the peak CPU or memory usage is close to the limit you set. If so, it means the Cloud Storage FUSE may throttle. + +- You can use value `"0"` to unset any resource limits or requests on Standard clusters. For example, annotation `gke-gcsfuse/cpu-limit: "0"` and `gke-gcsfuse/memory-limit: "0"` leave the sidecar container CPU and memory limit empty with the default requests. This is useful when you cannot decide on the amount of resources Cloud Storage FUSE needs for your workloads, and want to let Cloud Storage FUSE consume all the available resources on a node. After calculating the resource requirements for Cloud Storage FUSE based on your workload metrics, you can set appropriate limits. + +- You cannot use value "0" to unset the sidecar container resource limits and requests on Autopilot clusters. You have to explicitly set a larger resource limit for the sidecar container on Autopilot clusters, and rely on GCP metrics to decide whether increasing the resource limit is needed. + +> Note: there is a known issue where the sidecar container CPU allocation cannot exceed 2 vCPU and memory allocation cannot exceed 14 GiB on GPU nodes on Autopilot clusters. GKE is working to remove this limitation. + +### Bucket Location + +To improve performance, make sure your bucket and GKE cluster are in the same region. When you create the bucekt, set the `Location type` field to `Region`, and select a region where your GKE cluster is running. For example: ![example of bucket location](./images/bucket_location.png) + +### Metadata cache + +The Cloud Storage FUSE [stat metadata cache](https://cloud.google.com/storage/docs/gcsfuse-cache#stat-cache-overview) and [type metadata cache](https://cloud.google.com/storage/docs/gcsfuse-cache#type-cache-overview) can reduce the number of serial calls to Cloud Storage on repeat reads to the same file, which improves performance. Set stat and type caches according to the number of files that have repeat reads and might benefit from caching. You can follow the steps below to configure metadata caches. + +- Follow the [monitoring guidance](./monitoring.md#cloud-storage-bucket-observability) to check Cloud Storage requests. If the `GetObjectMetadata` request is observed throughout your workload, consider enabling the Cloud Storage FUSE metadata cache and increasing the cache capacity. + +- The metadata caches can be configured on GKE using [Mount options](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#mount-options), or [Volume attributes](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#volume-attributes) if your GKE cluster version is 1.25.16-gke.1759000, 1.26.15-gke.1158000, 1.27.12-gke.1190000, 1.28.8-gke.1175000, 1.29.3-gke.1093000 **or later**. + + - Volume attributes: + - `metadataStatCacheCapacity`: Use the default value of `32Mi` if your workload involves up to 20,000 files. If your workload reads more than 20,000 files, increase the size by values of 10 MiB for every additional 6,000 files, an average of ~1,500 bytes per file. Alternatively, you can set the value to `"-1"` to let the stat cache use as much memory as needed. + - `metadataTypeCacheCapacity`: Use the default value of `4Mi` if the maximum number of files within a single directory from the bucket you're mounting contains 20,000 files or less. If the maximum number of files within a single directory that you're mounting contains more than 20,000 files, increase the size by 1 MiB for every 5,000 files, an average of ~200 bytes per file. Alternatively, you can set the value to `"-1"` to let the type cache use as much memory as needed. + - `metadataCacheTtlSeconds`: Set the value to `"-1"` to bypass a TTL expiration and serve the file from the cache whenever it's available. + - For example: + - Inline ephemeral volume + + ```yaml + ... + apiVersion: v1 + kind: Pod + spec: + volumes: + - name: gcp-gcs-csi-ephemeral + csi: + driver: gcsfuse.csi.storage.gke.io + volumeAttributes: + bucketName: + metadataStatCacheCapacity: 512Mi + metadataTypeCacheCapacity: 64Mi + metadataCacheTtlSeconds: "-1" + ``` + + - PersistentVolume + + ```yaml + apiVersion: v1 + kind: PersistentVolume + spec: + ... + csi: + driver: gcsfuse.csi.storage.gke.io + volumeHandle: + volumeAttributes: + metadataStatCacheCapacity: 512Mi + metadataTypeCacheCapacity: 64Mi + metadataCacheTtlSeconds: "-1" + ``` + + - Mount options: + > Note: The following mount options are being deprecated, and you cannot configure type cache capacity using mount options. We recommand upgrading your GKE clusters to a newer version, and using volume attributes to configure metadata caches. + - `stat-cache-capacity`: Set the value to `"-1"` to let the stat cache use as much memory as needed. + - `stat-cache-ttl`: Set the value to `"-1"` to bypass a TTL expiration and serve the file from the cache whenever it's available. + - `type-cache-ttl`: Set the value to `"-1"` to bypass a TTL expiration and serve the file from the cache whenever it's available. + - For example: + - Inline ephemeral volume + + ```yaml + ... + apiVersion: v1 + kind: Pod + spec: + volumes: + - name: gcp-gcs-csi-ephemeral + csi: + driver: gcsfuse.csi.storage.gke.io + volumeAttributes: + bucketName: + mountOptions: "stat-cache-capacity=-1,stat-cache-ttl=-1,type-cache-ttl=-1" + ``` + + - PersistentVolume + + ```yaml + apiVersion: v1 + kind: PersistentVolume + spec: + ... + mountOptions: + - stat-cache-capacity=-1 + - stat-cache-ttl=-1 + - type-cache-ttl=-1 + csi: + driver: gcsfuse.csi.storage.gke.io + volumeHandle: + ``` + +- To optimize performance on the initial run of your workload, we suggest executing a complete listing beforehand. This can be achieved by running a command such as `ls -R` or its equivalent before your workload starts. This preemptive action populates the metadata caches in a faster, batched method, leading to improved efficiency. + +### File cache + +Cloud Storage FUSE has higher latency than a local file system. Throughput is reduced when you read or write small files (less than 3 MiB) one at a time, as it results in several separate Cloud Storage API calls. Reading or writing multiple large files at a time can help increase throughput. Use the [Cloud Storage FUSE file cache feature](https://cloud.google.com/storage/docs/gcsfuse-cache#file-cache-overview) to improve performance for small and random I/Os. The file cache feature can be configured on GKE using [Volume attributes](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#volume-attributes). You can follow the steps below to configure files cache. + +- Make sure your GKE cluster uses these GKE versions: 1.25.16-gke.1759000, 1.26.15-gke.1158000, 1.27.12-gke.1190000, 1.28.8-gke.1175000, 1.29.3-gke.1093000 **or later**. + +- Read through the GKE documentation [Consume your volumes with file caching enabled](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#file-cache). + +- Use the volume attribute `fileCacheCapacity` to enable the file cache, and specify the maximum size that the file cache can use. Set the value of `fileCacheForRangeRead` to be `"true"`. For example: + - Inline ephemeral volume + + ```yaml + ... + apiVersion: v1 + kind: Pod + spec: + volumes: + - name: gcp-gcs-csi-ephemeral + csi: + driver: gcsfuse.csi.storage.gke.io + volumeAttributes: + bucketName: + fileCacheCapacity: 512Gi + fileCacheForRangeRead: "true" + ``` + + - PersistentVolume + + ```yaml + apiVersion: v1 + kind: PersistentVolume + spec: + ... + csi: + driver: gcsfuse.csi.storage.gke.io + volumeHandle: + volumeAttributes: + fileCacheCapacity: 512Gi + fileCacheForRangeRead: "true" + ``` + +- By default, Cloud Storage FUSE uses an `emptyDir` volume for file cache on GKE. You can specify any type of storage supported by GKE, such as a `PersistentVolumeClaim`, and GKE will use the specified volume for file caching. For CPU and GPU VM families with Local SSD support, we recommend using Local SSD storage. For TPU families or Autopilot, we recommend using Balanced Persistent Disk or SSD Persistent Disk. See GKE documentation [Configure a custom read cache volume for the sidecar container](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#cache-volume) for details. + +> Note: If you choose to use the default `emptyDir` volume for file caching, the value of Pod annotation `gke-gcsfuse/ephemeral-storage-limit` must be larger than the `fileCacheCapacity` volume attribute. If a custom cache volume is used, the underlying volume size must be larger than the `fileCacheCapacity` volume attribute. + +### Other considerations + +Set the number of threads according to the number of CPU cores available. ML frameworks typically use `num_workers` to define the number of threads. If the number of cores or threads is higher than `100`, change the mount option `max-cons-per-host` to the same value. For example: + +- Inline ephemeral volume + +```yaml +... +apiVersion: v1 +kind: Pod +spec: + volumes: + - name: gcp-gcs-csi-ephemeral + csi: + driver: gcsfuse.csi.storage.gke.io + volumeAttributes: + bucketName: + mountOptions: "max-cons-per-host=500" +``` + +- PersistentVolume + +```yaml +apiVersion: v1 +kind: PersistentVolume +spec: + ... + mountOptions: + - max-cons-per-host=500 + csi: + driver: gcsfuse.csi.storage.gke.io + volumeHandle: +``` + +### Other storage options on GKE + +[Filestore CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/filestore-csi-driver) is a better option than Cloud Storage FUSE CSI driver for workloads that require high instantaneous input/output operations per second (IOPS) and lower latency. + +See GKE documentation [Storage for GKE clusters overview](https://cloud.google.com/kubernetes-engine/docs/concepts/storage-overview) for the storage options that GKE supports and some key considerations for selecting the best option for your business needs.