Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model saving fails on AWS instances with OOM kill #868

Open
Arseny-N opened this issue Oct 25, 2024 · 0 comments
Open

Model saving fails on AWS instances with OOM kill #868

Arseny-N opened this issue Oct 25, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Arseny-N
Copy link

Describe the bug

If I try to quantize llama3.1-8B-Instruct on a g5.2xlarge AWS instance. Model saving fails with the OOM killer terminating the process.

Expected behavior
A succesfull execution.

Environment
Include all relevant environment information:

  1. OS: nvcr.io/nvidia/pytorch:23.10-py3
  2. Python version: 3.10.12
  3. LLM Compressor version or commit hash 0.2.0
  4. ML framework version(s): '2.4.0+cu121'
  5. Other Python package versions: compressed-tensors=any
  6. Other relevant environment information [e.g. hardware, CUDA version]: g5.2xlarge, g5.4xlarge, g5.8xlarge AWS instances.

To Reproduce
Exact steps to reproduce the behavior:

Start an g5.2xlarge with nvcr.io/nvidia/pytorch:23.10-py3 container and run this script.

Note: g5.4xlarge, g5.8xlarge are also affected.

Additional context

The spike in memory consumption is somewhat misterious since it is caused by the kernel.

This is an example output of the free command prior to saving the model

total        used        free      shared  buff/cache   available
Mem:            30Gi        12Gi       635Mi        20Mi        18Gi        18Gi
Swap:             0B          0B          0B

The cause does not seem to be the saving process per-se, but some kernel caching behaviour, since the crash may be caused if the quantization is run several consecutive times in a single process.

Screenshot 2024-10-25 at 13 31 47

This is how the amount of free memory changes during the script execution.

Screenshot 2024-10-25 at 13 32 19

This is an example readout of /proc/meminfo prior to compressing the model and causing the OOM kill.

I successfully quantized the model on a separate system.

PS I might update this issue if I will inquire further in this behaviour.

@Arseny-N Arseny-N added the bug Something isn't working label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant