CUPY memory allocation error when running MOFA #145

wlason · 2024-08-21T20:08:56Z

I am trying to run mu.tl.mofa() as part of panpipes with a dataset of ~100k cells, and two modalities (RNA and ATAC). I am subsetting to HVFs so I have about 8k features in RNA and 25k in ATAC.

The model starts running but before the first iteration

######################################
## Training the model with seed 1 ##
######################################


ELBO before training: -18558515111.51

CUPY errors with memory allocation:

Traceback (most recent call last):
  File "/batch_correct_mofa.py", line 172, in <module>
    mu.tl.mofa(tmp, **mofa_kwargs)
  File "[dir]/lib/python3.10/site-packages/muon/_core/tools.py", line 588, in mofa
    ent.run()
  File "[dir]/lib/python3.10/site-packages/mofapy2/run/entry_point.py", line 57, in saver
    func(self, *args, **kwargs)
  File "[dir]/lib/python3.10/site-packages/mofapy2/run/entry_point.py", line 1434, in run
    train_model(self.model)
  File "[dir]/lib/python3.10/site-packages/mofapy2/build_model/train_model.py", line 27, in train_model
    model.iterate()
  File "[dir]/lib/python3.10/site-packages/mofapy2/core/BayesNet.py", line 291, in iterate
    self.nodes[node].update()
  File "[dir]/lib/python3.10/site-packages/mofapy2/core/nodes/multiview_nodes.py", line 136, in update
    self.nodes[m].updateParameters(ix, ro)
  File "[dir]/lib/python3.10/site-packages/mofapy2/core/nodes/W_nodes.py", line 204, in updateParameters
    self._updateParameters(
  File "[dir]/lib/python3.10/site-packages/mofapy2/core/nodes/W_nodes.py", line 249, in _updateParameters
    tauY_gpu = (tau_gpu * gpu_utils.array(Y)).T
  File "cupy/_core/core.pyx", line 1285, in cupy._core.core._ndarray_base.__mul__
  File "cupy/_core/_kernel.pyx", line 1350, in cupy._core._kernel.ufunc.__call__
  File "cupy/_core/_kernel.pyx", line 645, in cupy._core._kernel._get_out_args_from_optionals
  File "cupy/_core/core.pyx", line 2811, in cupy._core.core._ndarray_init
  File "cupy/_core/core.pyx", line 241, in cupy._core.core._ndarray_base._init_fast
  File "cupy/cuda/memory.pyx", line 738, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1424, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1445, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1116, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1137, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 1382, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
  File "cupy/cuda/memory.pyx", line 1385, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 16,604,200,448 bytes (allocated so far: 33,221,685,248 bytes).

OS: HPC
Python version 3.10
Versions of libraries involved muon 0.1.5

The text was updated successfully, but these errors were encountered:

gtca · 2024-10-21T22:28:55Z

Let's discuss it in bioFAM/mofapy2#32 but if there's not enough space on GPU to fit the whole 100k x (8k + 25k) Y matrix, you should try Stochastic Variational Inference (svi_mode).

That being said, there are also definitely places in the training code where more care could be taken when allocating GPU memory.

wlason added the bug Something isn't working label Aug 21, 2024

gtca closed this as completed Oct 21, 2024

gtca removed the bug Something isn't working label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUPY memory allocation error when running MOFA #145

CUPY memory allocation error when running MOFA #145

wlason commented Aug 21, 2024 •

edited

Loading

gtca commented Oct 21, 2024

CUPY memory allocation error when running MOFA #145

CUPY memory allocation error when running MOFA #145

Comments

wlason commented Aug 21, 2024 • edited Loading

gtca commented Oct 21, 2024

wlason commented Aug 21, 2024 •

edited

Loading