You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of our Onyxia users is frequently facing an error when creating a new service on the Onyxia UI, and after some rough testing, it seems that this could be caused by the fact this user has a large number of running service (24), which in turn has Onyxia API start that many threads to manage the listing request emanating from Onyxia UI.
Other users with fewer services couldn't reproduce the error. On the other hand, using kubectl exec to get a shell on the Onyxia API pod to try and run many helm get all ... in parallel ends up with the same error.
What's puzzling is that the error mentions some processes limit (ulimit -u) on the Golang side, and OOM Error on the Java side, but it doesn't seem look like we are limited on either of those sides (unlimited user processes as viewed in the container, and 4/16Go requests/limits for RAM) :
2024-04-29 10:29:29.214 runtime: may need to increase max user processes (ulimit -u)
2024-04-29 10:29:29.214 runtime: failed to create new OS thread (have 8 already; errno=11)
2024-04-29 10:29:29.214 fatal error: newosproc
2024-04-29 10:29:29.214 runtime: may need to increase max user processes (ulimit -u)
2024-04-29 10:29:29.214 runtime: failed to create new OS thread (have 7 already; errno=11)
2024-04-29 10:29:29.214 fatal error: newosproc
2024-04-29 10:29:29.214 runtime: may need to increase max user processes (ulimit -u)
2024-04-29 10:29:29.214 runtime: failed to create new OS thread (have 7 already; errno=11)
2024-04-29 10:29:29.213 fatal error: newosproc
2024-04-29 10:29:29.213 runtime: may need to increase max user processes (ulimit -u)
2024-04-29 10:29:29.213 runtime: failed to create new OS thread (have 5 already; errno=11)
2024-04-29 10:29:29.211 fatal error: newosproc
2024-04-29 10:29:29.211 runtime: may need to increase max user processes (ulimit -u)
2024-04-29 10:29:29.211 runtime: failed to create new OS thread (have 6 already; errno=11)
2024-04-24T07:09:37.528Z WARN 7 --- [ool-worker-8074] i.g.i.h.service.HelmInstallService : Exception occurre
org.zeroturnaround.exec.ProcessInitException: Could not execute [helm, get, all, vscode-python-273990, --namespace, REDACTED]. Error=11, Resource temporarily unavailable at org.zeroturnaround.exec.ProcessInitException.newInstance(ProcessInitException.java:80) ~[zt-exec-1.12.jar:na] at org.zeroturnaround.exec.ProcessExecutor.invokeStart(ProcessExecutor.java:1002) ~[zt-exec-1.12.jar:na] at org.zeroturnaround.exec.ProcessExecutor.startInternal(ProcessExecutor.java:970) ~[zt-exec-1.12.jar:na] at org.zeroturnaround.exec.ProcessExecutor.execute(ProcessExecutor.java:906) ~[zt-exec-1.12.jar:na] at io.github.inseefrlab.helmwrapper.utils.Command.execute(Command.java:73) ~[java-helm-wrapper-v2.5.0.jar:v2.5.0
at io.github.inseefrlab.helmwrapper.service.HelmInstallService.getAll(HelmInstallService.java:139) ~[java-helm-wrapper-v2.5.0.jar:v2.5.0] at fr.insee.onyxia.api.services.impl.HelmAppsService.getHelmApp(HelmAppsService.java:370) ~[classes/:v2.5.0] at fr.insee.onyxia.api.services.impl.HelmAppsService.lambda$getUserServices$2(HelmAppsService.java:249) ~[classes/:v2.5.0] at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) ~[na:na] at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Unknown Source) ~[na:na] at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) ~[na:na] at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) ~[na:na] at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) ~[na:na] at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) ~[na:na] at java.base/java.util.stream.AbstractTask.compute(Unknown Source) ~[na:na] at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) ~[na:na]Caused by: java.io.IOException: Cannot run program "helm": error=11, Resource temporarily unavailable at java.base/java.lang.ProcessBuilder.start(Unknown Source) ~[na:na] at java.base/java.lang.ProcessBuilder.start(Unknown Source) ~[na:na] at org.zeroturnaround.exec.ProcessExecutor.invokeStart(ProcessExecutor.java:997) ~[zt-exec-1.12.jar:na] ... 19 common frames omittedCaused by: java.io.IOException: error=11, Resource temporarily unavailable at java.base/java.lang.ProcessImpl.forkAndExec(Native Method) ~[na:na] at java.base/java.lang.ProcessImpl.<init>(Unknown Source) ~[na:na] at java.base/java.lang.ProcessImpl.start(Unknown Source) ~[na:na] ... 22 common frames omitte
[773707.539s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.[773707.540s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Thread-54750"
2024-04-24T07:09:37.905Z ERROR 7 --- [nio-8080-exec-9] o.a.c.c.C.[.[.[.[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [/api] threw exception [Request processing failed: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError] with root caus
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached at java.base/java.lang.Thread.start0(Native Method) ~[na:na] at java.base/java.lang.Thread.start(Unknown Source) ~[na:na] at org.zeroturnaround.exec.stream.PumpStreamHandler.start(PumpStreamHandler.java:175) ~[zt-exec-1.12.jar:na] at org.zeroturnaround.exec.ProcessExecutor.startInternal(ProcessExecutor.java:1050) ~[zt-exec-1.12.jar:na] at org.zeroturnaround.exec.ProcessExecutor.startInternal(ProcessExecutor.java:981) ~[zt-exec-1.12.jar:na] at org.zeroturnaround.exec.ProcessExecutor.execute(ProcessExecutor.java:906) ~[zt-exec-1.12.jar:na] at io.github.inseefrlab.helmwrapper.utils.Command.execute(Command.java:73) ~[java-helm-wrapper-v2.5.0.jar:v2.5.0
at io.github.inseefrlab.helmwrapper.service.HelmInstallService.getAll(HelmInstallService.java:139) ~[java-helm-wrapper-v2.5.0.jar:v2.5.0] at fr.insee.onyxia.api.services.impl.HelmAppsService.getHelmApp(HelmAppsService.java:370) ~[classes/:v2.5.0] at fr.insee.onyxia.api.services.impl.HelmAppsService.lambda$getUserServices$2(HelmAppsService.java:249) ~[classes/:v2.5.0] at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) ~[na:na] at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Unknown Source) ~[na:na] at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) ~[na:na] at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) ~[na:na] at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) ~[na:na] at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source) ~[na:na] at java.base/java.util.stream.AbstractTask.compute(Unknown Source) ~[na:na] at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) ~[na:na] at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) ~[na:na
After discussing with @olevitt, a possible way out of this could be to add a parameter in onyxia-api to allow limiting the maximum number of threads used for parallelization.
The text was updated successfully, but these errors were encountered:
Hi,
One of our Onyxia users is frequently facing an error when creating a new service on the Onyxia UI, and after some rough testing, it seems that this could be caused by the fact this user has a large number of running service (24), which in turn has Onyxia API start that many threads to manage the listing request emanating from Onyxia UI.
Other users with fewer services couldn't reproduce the error. On the other hand, using
kubectl exec
to get a shell on the Onyxia API pod to try and run manyhelm get all ...
in parallel ends up with the same error.What's puzzling is that the error mentions some processes limit (
ulimit -u
) on the Golang side, and OOM Error on the Java side, but it doesn't seem look like we are limited on either of those sides (unlimited user processes as viewed in the container, and 4/16Go requests/limits for RAM) :After discussing with @olevitt, a possible way out of this could be to add a parameter in onyxia-api to allow limiting the maximum number of threads used for parallelization.
The text was updated successfully, but these errors were encountered: