Attempt to fix problem with Nexus getting stuck while running with too little memory #4189

TonyWildish-BH · 2024-12-11T17:05:27Z

Attempts to resolve #4074

What is being addressed

The current Nexus VM is too small, and nexus doesn't get enough memory to run properly. This causes it to wedge frequently.

How is this addressed

Nexus VM changed from Standard_B2s to Standard_B8ms
Nexus deployment script updated to look how much memory the system has in total, and explicitly configure the Java VM memory based on that
bump version to 3.1.2

…e-Science/AzureTRE into increase-nexus-java-memory

TonyWildish-BH · 2024-12-11T17:09:33Z

@microsoft-github-policy-service agree [company="Barts Health NHS Trust"] ************************************************************************************** ****************************** This message may contain confidential information. If you are not the intended recipient please: i) inform the sender that you have received the message in error before deleting it; and ii) do not disclose, copy or distribute information in this e-mail or take any action in relation to its content (to do so is strictly prohibited and may be unlawful). Thank you for your co-operation. NHSmail is the secure email, collaboration and directory service available for all NHS staff in England. NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and other accredited email services. For more information and to find out how you can switch visit Joining NHSmail - NHSmail Support<https://support.nhs.net/article-categories/joining-nhsmail/>

tim-allen-ck · 2024-12-11T17:16:26Z

Thanks @TonyWildish will have a look. Any chance you can add a line to the CHANGELOG.md?

TonyWildish-BH · 2024-12-11T17:29:04Z

Thanks @TonyWildish will have a look. Any chance you can add a line to the CHANGELOG.md?

done...

marrobi · 2024-12-12T09:51:29Z

Just noting this before review, had an issue with a nexus instance due to:

2024-12-05 02:03:43,913+0000 WARN  [nexus housekeeper] *SYSTEM com.zaxxer.hikari.pool.HikariPool - nexus - Thread starvation or clock leap detected (housekeeper delta=9m1s359ms111?s287ns).

I wonder if we should avoid the B series.

marrobi · 2024-12-12T09:57:38Z

I think a D2v3 is likely to resolve the issue.

Also worth comparing price of Standard_B8ms and D8v3.

Will try have a look later.

marrobi

Thanks, if we can address the comments, good from my perspective. Thank you for the contribution, and congrats with the go live.

marrobi · 2024-12-12T10:24:49Z

templates/shared_services/sonatype-nexus-vm/scripts/deploy_nexus_container.sh

+mem_total_mb=$(( $(cat /proc/meminfo | head -1 | awk '{ print $2 }') / 1024 ))
+java_mem=2703
+if [ $mem_total_mb -gt 4096 ]; then
+  java_mem=$(( mem_total_mb * 3 / 4 ))


Is there a reason we use 3/4, rather than total, maybe minus 2GB for the OS?

Suggested change

java_mem=$(( mem_total_mb * 3 / 4 ))

java_mem=$(( mem_total_mb - 2048 ))

I just thought that as the machine gets bigger it may be advisable to leave more for the host OS, I don't like running machines at the limit, it may be detrimental to overall performance.

E.g., I don't know if Nexus does any in-memory caching of files. If not, it may help performance to leave some memory for the OS, so the OS filesystem cache can help.

I don't feel strongly about it, so I don't mind which way we go with this commit.

marrobi · 2024-12-12T10:28:03Z

templates/shared_services/sonatype-nexus-vm/terraform/vm.tf

@@ -98,7 +98,7 @@ resource "azurerm_linux_virtual_machine" "nexus" {
  resource_group_name             = local.core_resource_group_name
  location                        = data.azurerm_resource_group.rg.location
  network_interface_ids           = [azurerm_network_interface.nexus.id]
-  size                            = "Standard_B2s"
+  size                            = "Standard_B8ms"


Can we go for D2v3 for now? I've seen a Nexus DB corruption due to lack of CPU during upgrade.

We should have an additional issue to enable this to be customised at deploy time - so if users want to choose B series, or a bigger size then they can.

Do we really need such a big machine?
If going for a more general purpose one like @marrobi suggested maybe a more recent one like a v5?

Suggested change

size = "Standard_B8ms"

size = "Standard_D2s_v3"

We can do another PR to enable it to be selected when deployed.

marrobi · 2024-12-13T16:15:36Z

I've had another one of these die today, not sure if the nexus container is behaving differently due to an update. @TonyWildish-BH let me know your thoughts on the changes, but keen to get this in, then follow with an option to select VM size. Thanks.

TonyWildish added 5 commits December 11, 2024 16:43

Attempt to fix Nexus problem of running with too little memory

fca7cc2

Attempt to fix Nexus problem of running with too little memory

295ffc6

Merge branch 'increase-nexus-java-memory' of bls.github.com:Barts-Lif…

ef93c0d

…e-Science/AzureTRE into increase-nexus-java-memory

Attempt to fix Nexus problem of running with too little memory

49eb51a

Merge branch 'increase-nexus-java-memory' of bls.github.com:Barts-Lif…

84c8acc

…e-Science/AzureTRE into increase-nexus-java-memory

github-actions bot added the external PR from an external contributor label Dec 11, 2024

Update CHANGELOG

e3dd18f

marrobi approved these changes Dec 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to fix problem with Nexus getting stuck while running with too little memory #4189

Attempt to fix problem with Nexus getting stuck while running with too little memory #4189

TonyWildish-BH commented Dec 11, 2024

TonyWildish-BH commented Dec 11, 2024 via email

tim-allen-ck commented Dec 11, 2024

TonyWildish-BH commented Dec 11, 2024

marrobi commented Dec 12, 2024

marrobi commented Dec 12, 2024

marrobi left a comment

marrobi Dec 12, 2024

marrobi Dec 13, 2024

TonyWildish-BH Dec 13, 2024

marrobi Dec 12, 2024

tamirkamara Dec 12, 2024

marrobi Dec 13, 2024

marrobi commented Dec 13, 2024

	java_mem=$(( mem_total_mb * 3 / 4 ))
	java_mem=$(( mem_total_mb - 2048 ))

Attempt to fix problem with Nexus getting stuck while running with too little memory #4189

Are you sure you want to change the base?

Attempt to fix problem with Nexus getting stuck while running with too little memory #4189

Conversation

TonyWildish-BH commented Dec 11, 2024

Attempts to resolve #4074

What is being addressed

How is this addressed

TonyWildish-BH commented Dec 11, 2024 via email

tim-allen-ck commented Dec 11, 2024

TonyWildish-BH commented Dec 11, 2024

marrobi commented Dec 12, 2024

marrobi commented Dec 12, 2024

marrobi left a comment

Choose a reason for hiding this comment

marrobi Dec 12, 2024

Choose a reason for hiding this comment

marrobi Dec 13, 2024

Choose a reason for hiding this comment

TonyWildish-BH Dec 13, 2024

Choose a reason for hiding this comment

marrobi Dec 12, 2024

Choose a reason for hiding this comment

tamirkamara Dec 12, 2024

Choose a reason for hiding this comment

marrobi Dec 13, 2024

Choose a reason for hiding this comment

marrobi commented Dec 13, 2024