Event source cannot handle a 200+ file run. #16

sizun · 2021-06-22T06:45:21Z

Crash caused by memory comsumption.

In [1]: from ctapipe_io_nectarcam import NectarCAMEventSource
In [6]: input_url = '/data/nvme/ZFITS/2021/20210618/NectarCAM.Run2521.[0-9][0-9][0-9][0-9].fits.fz'
In [7]: source = NectarCAMEventSource(input_url=input_url, n_gains=1)
Killed

The text was updated successfully, but these errors were encountered:

maxnoe · 2023-06-16T08:45:54Z

I recently rewrote the MultiFiles of ctapipe_io_lst to handle this nicely.

The only downside is that you cannot longer know __len__ of the multifiles / the source since you load subruns on the fly.

maxnoe · 2023-06-16T08:47:47Z

Code here: https://github.com/cta-observatory/ctapipe_io_lst/blob/main/src/ctapipe_io_lst/multifiles.py

tibaldo · 2023-06-16T12:12:16Z

Thank you @maxnoe. You are just suggesting we copy this to ctapipe_io_nectarcam, is that right?

maxnoe · 2023-06-16T12:19:07Z

If that would solve your issue, sure, go ahead, you probably only need to adapt the regexes / patterns for the filenames.

A more general version of this could be in the common event source, but at least in the next one or two weeks I probably don't have time to start that.

mdpunch · 2024-05-05T07:58:46Z

Question: Did this get implemented?

On the CC-IN2P3 Jupyter platform, which is limited to 2G memory unless requesting more, if I do a full wildcard regex then the memory explodes. So, I'm doing my own glob for looping over files 4 at a time and then passing with input_filelist. [subsidiary question, how to know how many files to open at each time? Previous EVB was 2 files, I think, and it's now 4, I see, with EVBv6].

If this can be fixed instead by using Max's code, I could implement it... unless Luigi prefers to.

tibaldo · 2024-05-05T10:30:56Z

Hi @mdpunch, I will not be able to work on this inn the coming weeks. Please, go ahead and implement it if you want.

vmarandon · 2024-05-05T12:57:05Z

2GB is quite low. You won't do much with it... :-)
(I never could reproduce the issue with my MacBook Pro M2 on recent night long run. The only issue was the too large number of files, that I circumvent with a ulimit.)
The run we took for you is only 4 files, so I'm surprised it complains with regex but your approach makes it work...

mdpunch · 2024-05-05T13:55:13Z

Hi Vincent,

I can ask the CC for more, I guess. I have lots more on my PC, but not enough disk space for the other set of runs I'm analyzing, which are the "Throttler" runs where there are a few tens of files.

Even with my trick (and probably also with Max's code) I still hit the 2GB limit after 20 files or so (even with deleting objects along the way), so there is maybe some kind of memory leak (though I thought Python was better with that).

So, I'll do both things: look into implementing Max's code in NectarCAM, and ask the CC for more memory.

BTW, do you know where we can find how many files the data stream is spread between at a time for a given EVB?

vmarandon · 2024-05-05T16:58:06Z

I don't think this is possible. To me the number of data stream is "arbitrary". Nevertheless, as far as I know we had 2 for data before EVBv6 and 4 after (@sizun : is that correct ?) so you can use that.
You can identify the type of EVB from the file directly (see how I did it in NectarCAMEventSource PR #51). If you're lazy, you can open one file with this version of NectarCAMEventSource and get the information on data before v6 using the pre_v6_data property.
(What I don't like with Max solution is that the len does not work anymore)

Have you try to get the information one file at a time and to call the garbage collector at the end of each file ?
You could then re-order the information from memory. (I assume that you are interested only in trigger times, so you should be able to have quite some margin with 2GB)

sizun · 2024-05-06T07:12:03Z

To me the number of data stream is "arbitrary". Nevertheless, as far as I know we had 2 for data before EVBv6 and 4 after (@sizun : is that correct ?)
Yes

maxnoe · 2024-05-06T13:26:39Z

What I don't like with Max solution is that the len does not work anymore

You can certainly fix that, we don't really need in at the moment so I didn't bother. It would increase the startup-time by quite a bit, but you can build a list of files that match the given options in __init__ and look at ZNAXIS2 header keyword in all files to build the total number of events.

You could also do that lazily, only if someone actually asks for __len__.

mdpunch · 2024-05-06T15:32:41Z

Dirk tells me that the number of streams is indeed arbitrary, and he won't bother to add an element to the header saying which number to use, because the whole thing will be deprecated when the ADH replaces it (which I guess is "soon" but on CTAO time-scales).

If you're lazy, you can open one file with this version of NectarCAMEventSource and get the information on data before v6 using the pre_v6_data property.

I'll look into that.

You could then re-order the information from memory. (I assume that you are interested only in trigger times, so you should be able to have quite some margin with 2GB)

Indeed, I could read them, get the times, and then do a big ol' sort. But I was trying to do things properly (but as Voltaire said, "Il meglio è l'inimico del bene").

Anyway, I have now 16GB memory on the CC Jupyter hub, so it's no longer a problem, but I will still look at making it a little bit more automatic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event source cannot handle a 200+ file run. #16

Event source cannot handle a 200+ file run. #16

sizun commented Jun 22, 2021

maxnoe commented Jun 16, 2023

maxnoe commented Jun 16, 2023

tibaldo commented Jun 16, 2023

maxnoe commented Jun 16, 2023

mdpunch commented May 5, 2024

tibaldo commented May 5, 2024

vmarandon commented May 5, 2024

mdpunch commented May 5, 2024

vmarandon commented May 5, 2024

sizun commented May 6, 2024

maxnoe commented May 6, 2024 •

edited

Loading

mdpunch commented May 6, 2024 •

edited

Loading

Event source cannot handle a 200+ file run. #16

Event source cannot handle a 200+ file run. #16

Comments

sizun commented Jun 22, 2021

maxnoe commented Jun 16, 2023

maxnoe commented Jun 16, 2023

tibaldo commented Jun 16, 2023

maxnoe commented Jun 16, 2023

mdpunch commented May 5, 2024

tibaldo commented May 5, 2024

vmarandon commented May 5, 2024

mdpunch commented May 5, 2024

vmarandon commented May 5, 2024

sizun commented May 6, 2024

maxnoe commented May 6, 2024 • edited Loading

mdpunch commented May 6, 2024 • edited Loading

maxnoe commented May 6, 2024 •

edited

Loading

mdpunch commented May 6, 2024 •

edited

Loading