Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUPY memory allocation error when running MOFA #145

Closed
wlason opened this issue Aug 21, 2024 · 1 comment
Closed

CUPY memory allocation error when running MOFA #145

wlason opened this issue Aug 21, 2024 · 1 comment

Comments

@wlason
Copy link

wlason commented Aug 21, 2024

I am trying to run mu.tl.mofa() as part of panpipes with a dataset of ~100k cells, and two modalities (RNA and ATAC). I am subsetting to HVFs so I have about 8k features in RNA and 25k in ATAC.

The model starts running but before the first iteration

######################################
## Training the model with seed 1 ##
######################################


ELBO before training: -18558515111.51 

CUPY errors with memory allocation:

Traceback (most recent call last):
  File "/batch_correct_mofa.py", line 172, in <module>
    mu.tl.mofa(tmp, **mofa_kwargs)
  File "[dir]/lib/python3.10/site-packages/muon/_core/tools.py", line 588, in mofa
    ent.run()
  File "[dir]/lib/python3.10/site-packages/mofapy2/run/entry_point.py", line 57, in saver
    func(self, *args, **kwargs)
  File "[dir]/lib/python3.10/site-packages/mofapy2/run/entry_point.py", line 1434, in run
    train_model(self.model)
  File "[dir]/lib/python3.10/site-packages/mofapy2/build_model/train_model.py", line 27, in train_model
    model.iterate()
  File "[dir]/lib/python3.10/site-packages/mofapy2/core/BayesNet.py", line 291, in iterate
    self.nodes[node].update()
  File "[dir]/lib/python3.10/site-packages/mofapy2/core/nodes/multiview_nodes.py", line 136, in update
    self.nodes[m].updateParameters(ix, ro)
  File "[dir]/lib/python3.10/site-packages/mofapy2/core/nodes/W_nodes.py", line 204, in updateParameters
    self._updateParameters(
  File "[dir]/lib/python3.10/site-packages/mofapy2/core/nodes/W_nodes.py", line 249, in _updateParameters
    tauY_gpu = (tau_gpu * gpu_utils.array(Y)).T
  File "cupy/_core/core.pyx", line 1285, in cupy._core.core._ndarray_base.__mul__
  File "cupy/_core/_kernel.pyx", line 1350, in cupy._core._kernel.ufunc.__call__
  File "cupy/_core/_kernel.pyx", line 645, in cupy._core._kernel._get_out_args_from_optionals
  File "cupy/_core/core.pyx", line 2811, in cupy._core.core._ndarray_init
  File "cupy/_core/core.pyx", line 241, in cupy._core.core._ndarray_base._init_fast
  File "cupy/cuda/memory.pyx", line 738, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1424, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1445, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1116, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1137, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 1382, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
  File "cupy/cuda/memory.pyx", line 1385, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 16,604,200,448 bytes (allocated so far: 33,221,685,248 bytes).
  • OS: HPC
  • Python version 3.10
  • Versions of libraries involved muon 0.1.5
@wlason wlason added the bug Something isn't working label Aug 21, 2024
@gtca
Copy link
Collaborator

gtca commented Oct 21, 2024

Let's discuss it in bioFAM/mofapy2#32 but if there's not enough space on GPU to fit the whole 100k x (8k + 25k) Y matrix, you should try Stochastic Variational Inference (svi_mode).

That being said, there are also definitely places in the training code where more care could be taken when allocating GPU memory.

@gtca gtca closed this as completed Oct 21, 2024
@gtca gtca removed the bug Something isn't working label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants