You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently ran this script on a sample with 19k events and 30 systematics, and had to request 128GB of memory, which limits the number of simultaneous jobs allowed by RCC. Most of the memory seems to be wasted on storing each individual sub-matrix of 30 systematics:
30 x 19k^2 x 8 = 86.6 GB.
To reduce memory consumption, a refactor is needed to keep a running sum of COVSYS_TOT and release the memory for each systematic.
As a side note, recall that create_cov also inverts COVTOT to save time at cosmology fitting stage. The matrix invert time for this 19k^2 matrix was about 5 hr, consistent with O(n^3) number of operations needed to invert. If we want to invert 100k^2 matrix someday in the future, the CPU time increase is expected to be 5^3 = 125 --> 5 x 125 = 625 hr. Even using a newer CPU running x10 faster, the CPU time would be a few days and I am not aware of a cluster allowing few-day run time in slurm queue.
The text was updated successfully, but these errors were encountered:
refactor to save memory by computing and deleting cov matrix contributions on-the-fly. Same for cov inverse. See internal global FLAG_REDUCE_MEMORY. Also noticed that performing "positive definite" test takes much longer than matrix inversion, so this test now runs only for size < 2000. The comments above about long matrix inversion were really for the pos-def and "condition" test.
I recently ran this script on a sample with 19k events and 30 systematics, and had to request 128GB of memory, which limits the number of simultaneous jobs allowed by RCC. Most of the memory seems to be wasted on storing each individual sub-matrix of 30 systematics:
30 x 19k^2 x 8 = 86.6 GB.
To reduce memory consumption, a refactor is needed to keep a running sum of COVSYS_TOT and release the memory for each systematic.
As a side note, recall that create_cov also inverts COVTOT to save time at cosmology fitting stage. The matrix invert time for this 19k^2 matrix was about 5 hr, consistent with O(n^3) number of operations needed to invert. If we want to invert 100k^2 matrix someday in the future, the CPU time increase is expected to be 5^3 = 125 --> 5 x 125 = 625 hr. Even using a newer CPU running x10 faster, the CPU time would be a few days and I am not aware of a cluster allowing few-day run time in slurm queue.
The text was updated successfully, but these errors were encountered: