Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix problem with Nexus getting stuck while running with too little memory #4189

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

TonyWildish-BH
Copy link

Attempts to resolve #4074

What is being addressed

The current Nexus VM is too small, and nexus doesn't get enough memory to run properly. This causes it to wedge frequently.

How is this addressed

  • Nexus VM changed from Standard_B2s to Standard_B8ms
  • Nexus deployment script updated to look how much memory the system has in total, and explicitly configure the Java VM memory based on that
  • bump version to 3.1.2

@github-actions github-actions bot added the external PR from an external contributor label Dec 11, 2024
@TonyWildish-BH
Copy link
Author

TonyWildish-BH commented Dec 11, 2024 via email

@tim-allen-ck
Copy link
Collaborator

Thanks @TonyWildish will have a look. Any chance you can add a line to the CHANGELOG.md?

@TonyWildish-BH
Copy link
Author

Thanks @TonyWildish will have a look. Any chance you can add a line to the CHANGELOG.md?

done...

@marrobi
Copy link
Member

marrobi commented Dec 12, 2024

Just noting this before review, had an issue with a nexus instance due to:

2024-12-05 02:03:43,913+0000 WARN  [nexus housekeeper] *SYSTEM com.zaxxer.hikari.pool.HikariPool - nexus - Thread starvation or clock leap detected (housekeeper delta=9m1s359ms111?s287ns).

I wonder if we should avoid the B series.

@marrobi
Copy link
Member

marrobi commented Dec 12, 2024

I think a D2v3 is likely to resolve the issue.

Also worth comparing price of Standard_B8ms and D8v3.

Will try have a look later.

Copy link
Member

@marrobi marrobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, if we can address the comments, good from my perspective. Thank you for the contribution, and congrats with the go live.

mem_total_mb=$(( $(cat /proc/meminfo | head -1 | awk '{ print $2 }') / 1024 ))
java_mem=2703
if [ $mem_total_mb -gt 4096 ]; then
java_mem=$(( mem_total_mb * 3 / 4 ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we use 3/4, rather than total, maybe minus 2GB for the OS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
java_mem=$(( mem_total_mb * 3 / 4 ))
java_mem=$(( mem_total_mb - 2048 ))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought that as the machine gets bigger it may be advisable to leave more for the host OS, I don't like running machines at the limit, it may be detrimental to overall performance.

E.g., I don't know if Nexus does any in-memory caching of files. If not, it may help performance to leave some memory for the OS, so the OS filesystem cache can help.

I don't feel strongly about it, so I don't mind which way we go with this commit.

@@ -98,7 +98,7 @@ resource "azurerm_linux_virtual_machine" "nexus" {
resource_group_name = local.core_resource_group_name
location = data.azurerm_resource_group.rg.location
network_interface_ids = [azurerm_network_interface.nexus.id]
size = "Standard_B2s"
size = "Standard_B8ms"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we go for D2v3 for now? I've seen a Nexus DB corruption due to lack of CPU during upgrade.

We should have an additional issue to enable this to be customised at deploy time - so if users want to choose B series, or a bigger size then they can.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need such a big machine?
If going for a more general purpose one like @marrobi suggested maybe a more recent one like a v5?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size = "Standard_B8ms"
size = "Standard_D2s_v3"

We can do another PR to enable it to be selected when deployed.

@marrobi
Copy link
Member

marrobi commented Dec 13, 2024

I've had another one of these die today, not sure if the nexus container is behaving differently due to an update. @TonyWildish-BH let me know your thoughts on the changes, but keen to get this in, then follow with an option to select VM size. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external PR from an external contributor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nexus VM runs out of RAM
5 participants