-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to fix problem with Nexus getting stuck while running with too little memory #4189
base: main
Are you sure you want to change the base?
Attempt to fix problem with Nexus getting stuck while running with too little memory #4189
Conversation
…e-Science/AzureTRE into increase-nexus-java-memory
…e-Science/AzureTRE into increase-nexus-java-memory
@microsoft-github-policy-service agree [company="Barts Health NHS Trust"]
************************************************************************************** ******************************
This message may contain confidential information. If you are not the intended recipient please:
i) inform the sender that you have received the message in error before deleting it; and
ii) do not disclose, copy or distribute information in this e-mail or take any action in relation to its content (to do so is strictly prohibited and may be unlawful).
Thank you for your co-operation.
NHSmail is the secure email, collaboration and directory service available for all NHS staff in England. NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and other accredited email services.
For more information and to find out how you can switch visit Joining NHSmail - NHSmail Support<https://support.nhs.net/article-categories/joining-nhsmail/>
|
Thanks @TonyWildish will have a look. Any chance you can add a line to the |
done... |
Just noting this before review, had an issue with a nexus instance due to:
I wonder if we should avoid the B series. |
I think a D2v3 is likely to resolve the issue. Also worth comparing price of Standard_B8ms and D8v3. Will try have a look later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, if we can address the comments, good from my perspective. Thank you for the contribution, and congrats with the go live.
mem_total_mb=$(( $(cat /proc/meminfo | head -1 | awk '{ print $2 }') / 1024 )) | ||
java_mem=2703 | ||
if [ $mem_total_mb -gt 4096 ]; then | ||
java_mem=$(( mem_total_mb * 3 / 4 )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we use 3/4, rather than total, maybe minus 2GB for the OS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java_mem=$(( mem_total_mb * 3 / 4 )) | |
java_mem=$(( mem_total_mb - 2048 )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought that as the machine gets bigger it may be advisable to leave more for the host OS, I don't like running machines at the limit, it may be detrimental to overall performance.
E.g., I don't know if Nexus does any in-memory caching of files. If not, it may help performance to leave some memory for the OS, so the OS filesystem cache can help.
I don't feel strongly about it, so I don't mind which way we go with this commit.
@@ -98,7 +98,7 @@ resource "azurerm_linux_virtual_machine" "nexus" { | |||
resource_group_name = local.core_resource_group_name | |||
location = data.azurerm_resource_group.rg.location | |||
network_interface_ids = [azurerm_network_interface.nexus.id] | |||
size = "Standard_B2s" | |||
size = "Standard_B8ms" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we go for D2v3 for now? I've seen a Nexus DB corruption due to lack of CPU during upgrade.
We should have an additional issue to enable this to be customised at deploy time - so if users want to choose B series, or a bigger size then they can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need such a big machine?
If going for a more general purpose one like @marrobi suggested maybe a more recent one like a v5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size = "Standard_B8ms" | |
size = "Standard_D2s_v3" |
We can do another PR to enable it to be selected when deployed.
I've had another one of these die today, not sure if the nexus container is behaving differently due to an update. @TonyWildish-BH let me know your thoughts on the changes, but keen to get this in, then follow with an option to select VM size. Thanks. |
Attempts to resolve #4074
What is being addressed
The current Nexus VM is too small, and nexus doesn't get enough memory to run properly. This causes it to wedge frequently.
How is this addressed
Standard_B2s
toStandard_B8ms