Splunk Operator: slow mounting of ebs volume hence pod is keeping "container creating" state for too long #1288

yaroslav-nakonechnikov · 2024-02-20T15:11:14Z

Please select the type of request

Enhancement

Tell us more

Describe the request
as we are using EBS volumes with quite big sizes (10Tb+) for indexers, and sometimes it is requred to change node, we found that mounting of EBS and starting pod takes too much time.
In our case it is 70 minutes just to start start pod after assignment to node.

after investigation, we found that k8s by default forces persmissions. ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
and it takes a lot of time.

Expected behavior
In documenation it is mentioned with some examples how to solve it and crd has default value for fsGroupChangePolicy = "OnRootMismatch"

The text was updated successfully, but these errors were encountered:

vivekr-splunk · 2024-02-21T07:27:41Z

@yaroslav-nakonechnikov just wanted to check what version of splunk operator you are using

yaroslav-nakonechnikov · 2024-02-21T07:54:51Z

@vivekr-splunk crd didn't changed a lot from beginning. But i'd say 2.4 and 2.5 doesn't have that feature.

logsecvuln · 2024-02-21T08:31:18Z

@vivekr-splunk @akondur Splunk support ticket has also been raised for that matter. Please refer to the following case number "CASE [3423864]".

akondur · 2024-02-21T17:59:03Z

Hi @yaroslav-nakonechnikov , is the request here to change the fsGroupChangePolicy to OnRootMismatch?

yaroslav-nakonechnikov · 2024-02-22T07:40:37Z

request is to add support for it and inform users about potential issues with big volumes.

as a result it can be changed by default, as from my perspective it doesn't look necessary to change permissions on each mount

akondur · 2024-02-23T01:04:59Z

@yaroslav-nakonechnikov , have you tried changing the fsGroupChangePolicy to OnRootMismatch and check if that fixes the issue in your environment? This can be done my manully disabling the operator(temporarily) and testing it on one of your Splunk instances? We are currently evaluating the option on our end.

yaroslav-nakonechnikov · 2024-02-23T07:29:27Z

@akondur how? any change in statefulset/pod leads to recreate it. and crd doesn't have that option

akondur · 2024-02-23T19:22:20Z

@yaroslav-nakonechnikov You could create a simple Splunk statefulSet which attaches to EBS volumes and try reproducing the issue - post which you can change the policy to see if it changes. Alternatively before changing nodes for the pods, you could delete the operator temporarily and edit the statefulSet

yaroslav-nakonechnikov · 2024-02-26T07:54:19Z

@akondur in that case why you can't recheck it if you already know what and how to recheck?

i reported problem as a customer. now it is your step to get most of it and repeat for it.
Honestly, i don't understand why i have to spin another cluster with another 11Tb disks and fill it all with some dump data? Will you pay for it?

vivekr-splunk · 2024-02-26T18:11:41Z

Hello @yaroslav-nakonechnikov, Thank you for investigating this issue and identifying a possible solution. We will replicate the problem on our end and test to see if your fix resolves it. we will get back to you soon on this

akondur · 2024-02-28T16:58:04Z

Hey @yaroslav-nakonechnikov , we have merged the change to update the fsGroupChangePolicy. Please let us know if the issue still persists and we can re-visit the issue.

yaroslav-nakonechnikov · 2024-02-29T09:55:53Z

@akondur this is good.
so now, need to wait till it will be released.

as for now i don't know how to check it, knowing that fact that 2.5.0 and 2.5.1 also not working as expected.

akondur · 2024-02-29T17:02:38Z

@yaroslav-nakonechnikov We have reverted the change as we are going to release 2.5.2 this week. Will re-introduce it right after in develop. If this change is needed soon - we will make another minor release. Will update the PR here as soon as it's ready.

akondur · 2024-03-06T21:34:46Z

Hey @yaroslav-nakonechnikov , please find the merged MR into develop here. Please let me know if you're still facing issues with this change.

akondur · 2024-04-16T18:33:13Z

Closing this issue per the MR. Please re-open it if the issue still persists.

yaroslav-nakonechnikov · 2024-04-17T06:47:37Z

how it can be closed, if it is not released yet?

yaroslav-nakonechnikov · 2024-11-15T13:12:04Z

all good, it is there.

yaroslav-nakonechnikov assigned jryb, kumarajeet and vivekr-splunk Feb 20, 2024

yaroslav-nakonechnikov changed the title ~~Splunk Operator: ebs to slow to mount hence pod is keeping "container creating" state for too long~~ Splunk Operator: slow mounting of ebs volume hence pod is keeping "container creating" state for too long Feb 20, 2024

akondur assigned akondur and unassigned kumarajeet, jryb and vivekr-splunk Feb 20, 2024

akondur mentioned this issue Feb 23, 2024

CSPL-2542 - Made fsGroupChangePolicy OnRootMismatch to reduce latencies #1295

Merged

akondur closed this as completed Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splunk Operator: slow mounting of ebs volume hence pod is keeping "container creating" state for too long #1288

Splunk Operator: slow mounting of ebs volume hence pod is keeping "container creating" state for too long #1288

yaroslav-nakonechnikov commented Feb 20, 2024

vivekr-splunk commented Feb 21, 2024

yaroslav-nakonechnikov commented Feb 21, 2024

logsecvuln commented Feb 21, 2024

akondur commented Feb 21, 2024

yaroslav-nakonechnikov commented Feb 22, 2024

akondur commented Feb 23, 2024 •

edited

Loading

yaroslav-nakonechnikov commented Feb 23, 2024

akondur commented Feb 23, 2024

yaroslav-nakonechnikov commented Feb 26, 2024

vivekr-splunk commented Feb 26, 2024

akondur commented Feb 28, 2024

yaroslav-nakonechnikov commented Feb 29, 2024

akondur commented Feb 29, 2024 •

edited

Loading

akondur commented Mar 6, 2024

akondur commented Apr 16, 2024

yaroslav-nakonechnikov commented Apr 17, 2024

yaroslav-nakonechnikov commented Nov 15, 2024

Splunk Operator: slow mounting of ebs volume hence pod is keeping "container creating" state for too long #1288

Splunk Operator: slow mounting of ebs volume hence pod is keeping "container creating" state for too long #1288

Comments

yaroslav-nakonechnikov commented Feb 20, 2024

Please select the type of request

Tell us more

vivekr-splunk commented Feb 21, 2024

yaroslav-nakonechnikov commented Feb 21, 2024

logsecvuln commented Feb 21, 2024

akondur commented Feb 21, 2024

yaroslav-nakonechnikov commented Feb 22, 2024

akondur commented Feb 23, 2024 • edited Loading

yaroslav-nakonechnikov commented Feb 23, 2024

akondur commented Feb 23, 2024

yaroslav-nakonechnikov commented Feb 26, 2024

vivekr-splunk commented Feb 26, 2024

akondur commented Feb 28, 2024

yaroslav-nakonechnikov commented Feb 29, 2024

akondur commented Feb 29, 2024 • edited Loading

akondur commented Mar 6, 2024

akondur commented Apr 16, 2024

yaroslav-nakonechnikov commented Apr 17, 2024

yaroslav-nakonechnikov commented Nov 15, 2024

akondur commented Feb 23, 2024 •

edited

Loading

akondur commented Feb 29, 2024 •

edited

Loading