Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM - Segmentation fault (not ulimit, not cgroups, not max-space, not exhausted RAM) #4474

Open
2 tasks done
riverego opened this issue Sep 1, 2024 · 5 comments
Open
2 tasks done

Comments

@riverego
Copy link

riverego commented Sep 1, 2024

Node.js Version

v22.7.0 & previous

NPM Version

v10.8.2 & previous

Operating System

Linux ip-10-8-1-229 6.1.0-23-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux

Subsystem

Other

Description

The code works as expected on my own computer : it crashes when max-old-space is reached around 32G...

But on cloud VMs (of Outscale) it always runs OOM around 20G.

The problem happens on all images that I have tested : Debian12, Debian 11 & Ubuntu 20 (outscale out of the box images) with same result on 128 and 64Go of RAM Vms and all tested node versions (22, 20 & 16)

$ cat /proc/<pid>/limits
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             257180               257180               processes
Max open files            1048576              1048576              files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       257180               257180               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

I checked ulimits, cgroups (even if cgroups kills a process with oom reaper, it doesn't throws a segfault), I found nothing...
I tried to put 50G fixed value on ulimits to see if unlimited hides a low default value and it's the same.

I looked with /proc/sys/vm/overcommit_memory 0,1,2 values and its the same.
I tried to recompile nodejs on the VM.... Same....
I exhausted ChatGPT ideas....

I thought maybe this is a host limit applied on processes, so I tried this :

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc,char* argv[]){
        size_t oneG=1024*1048576;
        size_t maxMem=17*oneG;
        void *memPointer = NULL;
        do{
                if(memPointer != NULL){
                        printf("Max Tested Memory = %zi\n",maxMem);
                        memset(memPointer,0,maxMem);
                        free(memPointer);
                }
                maxMem+=oneG;
                memPointer=malloc(maxMem);
        }while(memPointer != NULL);
        maxMem -= oneG;
        printf("Max Usable Memory aprox = %zi\n",maxMem);

        memPointer = malloc(maxMem);
        memset(memPointer,1,maxMem);
        sleep(30);

        return 0;
}

But this can reach the VM RAM limit (64G or 128G) without any problem.
Same for the stress command....

stress -m 1 --vm-bytes 32G --vm-keep

So I'm running out of ideas... I can't figure out what makes NodeJS run OOM around 20G on these VMs....

I hope someone here has a clue about what is happening....

Thank you.

Minimal Reproduction

const fill = new Array(1000).fill('o').join('')
const bufs = []
let i = 0
while (true) {
  ++i
  bufs.push(Array.from({ length: 10*1024 * 1024 }, (_,i) => i+fill))
  // console.log(i)
}

The code just have to reach the OOM point.

Output

$ node --max-old-space-size=32000 --trace-gc index.js
[...traces]
[12808:0x6f27120]   146468 ms: Scavenge 19279.2 (19571.3) -> 19263.9 (19571.3) MB, 50.10 / 0.00 ms  (average mu = 0.831, current mu = 0.831) allocation failure;
[12808:0x6f27120]   146787 ms: Scavenge 19317.6 (19610.3) -> 19302.1 (19610.5) MB, 35.85 / 0.00 ms  (average mu = 0.831, current mu = 0.831) allocation failure;
Segmentation fault

Before You Submit

  • I have looked for issues that already exist before submitting this
  • My issue follows the guidelines in the README file, and follows the 'How to ask a good question' guide at https://stackoverflow.com/help/how-to-ask
@gireeshpunathil
Copy link
Member

I guess this is applicable here - v8 array size is limited: https://stackoverflow.com/questions/70746898/why-cannot-v8-nodejs-allocate-a-max-size-array-if-sufficient-memory . can you pls examine the stack trace from a core file generated with ulimit -c unlimited ?

@riverego
Copy link
Author

riverego commented Sep 2, 2024

Hello. Thank you for your answer.
No it's not the array limitation. Each iteration create an array of 10 485 760 entries, and each passes.
And it needs less than 100 iterations to reach the OOM.

Moreover, this test reaches the old space limit on my computer.
The point of crash is random, it really looks like a ulimit reach. But there is no limit set, and the C program can malloc to the RAM limit..... It's just I can't see what can lock NodeJS memory :/

@gireeshpunathil
Copy link
Member

@riverego - I was referring to bufs in your code, which grows unboundedly.

but you say it carries only less than 100 entires when OOM is hit, so apparently that is not the cause.

I guess there is a limit on the number of maps (object shape descriptions) in v8, but I am not sure of it, also that cannot explain why it works in one system and not in another.

for this reasons, I would still recommend you to turn on ulimit -c and look at stack trace to see why it failed to allocate (and ofc, console o/p too)

@gireeshpunathil
Copy link
Member

@riverego - any updates?

@riverego
Copy link
Author

riverego commented Nov 15, 2024

Hello,

Sadly no. I made ticket to Outscale, They didn't gave me their feedback yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@gireeshpunathil @riverego and others