Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RADAR-Docker allows VM to run out of memory and hangs VM #215

Open
rocketsciencenerd opened this issue Feb 17, 2020 · 7 comments
Open

RADAR-Docker allows VM to run out of memory and hangs VM #215

rocketsciencenerd opened this issue Feb 17, 2020 · 7 comments

Comments

@rocketsciencenerd
Copy link

rocketsciencenerd commented Feb 17, 2020

I have a server running the latest radar-docker that hangs periodically with the latest update to managementportal version 0.5.8 and radar-output:0.6.0. The issue is because the VM runs out of memory and gets hung, I restart the VM everything is running but then it runs out of memory again, and then I have to restart the VM again...... This has been verified by the logs in the /var/log/kern.log file below:
image
image

My docker container setup is below:

Screen Shot 2020-02-17 at 8 47 49 AM

My vm specs match the recommended specs from https://radar-base.org/index.php/documentation/introduction/:
4-core CPU
16 GB memory
An SSD for the operating system and docker (at least 50 GB)
1 x 1 TB spinning disks for redundancy

Per @nivemaham's recommendation I am going to try changing this line: https://github.com/RADAR-base/RADAR-Docker/blob/master/dcompose-stack/radar-cp-hadoop-stack/docker-compose.yml#L827

to RADAR_HDFS_RESTRUCTURE_OPTS: -Xms250m -Xmx2g

Hope this helps others - may also be worth looking into on the master branch.

@nivemaham
Copy link
Member

Thanks for reporting this issue @rocketsciencenerd . I think it could be related to running the radar-output as part of the stack and the specifications you were created based on an older version of RADAR-Docker where we run the radar-output as systemctl service with an interval.
The current configuration on radar-output

RADAR_HDFS_RESTRUCTURE_OPTS: -Xms250m -Xmx4g

seems to consume (not necessarily continuously) up to 4G just for this container which may have caused OOM issue on the VM since the rest of the platform also requires some memory.

Which is why lowering the -Xmx to 2g seem to be a solution for me.
Would it be possible for your system admin to check which container caused the OOM?

@yatharthranjan
Copy link
Member

Hi, nivethika's suggestion sounds good.
Not sure if you already know this, but you can use docker stats to see how much resources each container is consuming.

@rocketsciencenerd
Copy link
Author

rocketsciencenerd commented Feb 17, 2020

@yatharthranjan Is there a way to get a history of docker container usage? Looks like the stats command displays current usage.

@yatharthranjan
Copy link
Member

yatharthranjan commented Feb 17, 2020

Hi, i am not aware of a straight forward way to do that (I use netdata for monitoring the cgroups and the VM as a whole. It can be deployed in docker itself). But there are other tools that you can use for this. A quick google search should reveal some.

@afolarin
Copy link
Member

CAdvisor is also relatively easy to setup (you can also use Prometheus)
https://github.com/google/cadvisor
https://prometheus.io/docs/guides/cadvisor/
but there are a bunch of other options too

@blootsvoets
Copy link
Member

Since recently, we've had to increase our memory requirements to 24 GB on the base system. I think it would be wise to make that the base requirement. To avoid OOM but keep running with degraded performance, you can consider to enable swap (see e.g. https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-18-04)

@iDmple
Copy link

iDmple commented Mar 24, 2021

Just to let you know that I have a similar issue on the same machine configuration (16GB RAM), some containers are hanging (kafka and HDFS).
I resized to 32GB to test if it solves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants