create_covariance needs refactor to reduce memory consumption #1454

RickKessler · 2024-11-30T16:45:04Z

I recently ran this script on a sample with 19k events and 30 systematics, and had to request 128GB of memory, which limits the number of simultaneous jobs allowed by RCC. Most of the memory seems to be wasted on storing each individual sub-matrix of 30 systematics:
30 x 19k^2 x 8 = 86.6 GB.
To reduce memory consumption, a refactor is needed to keep a running sum of COVSYS_TOT and release the memory for each systematic.

As a side note, recall that create_cov also inverts COVTOT to save time at cosmology fitting stage. The matrix invert time for this 19k^2 matrix was about 5 hr, consistent with O(n^3) number of operations needed to invert. If we want to invert 100k^2 matrix someday in the future, the CPU time increase is expected to be 5^3 = 125 --> 5 x 125 = 625 hr. Even using a newer CPU running x10 faster, the CPU time would be a few days and I am not aware of a cluster allowing few-day run time in slurm queue.

RickKessler · 2025-01-01T22:49:53Z

refactor to save memory by computing and deleting cov matrix contributions on-the-fly. Same for cov inverse. See internal global FLAG_REDUCE_MEMORY. Also noticed that performing "positive definite" test takes much longer than matrix inversion, so this test now runs only for size < 2000. The comments above about long matrix inversion were really for the pos-def and "condition" test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create_covariance needs refactor to reduce memory consumption #1454

create_covariance needs refactor to reduce memory consumption #1454

RickKessler commented Nov 30, 2024

RickKessler commented Jan 1, 2025

create_covariance needs refactor to reduce memory consumption #1454

create_covariance needs refactor to reduce memory consumption #1454

Comments

RickKessler commented Nov 30, 2024

RickKessler commented Jan 1, 2025