-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event source cannot handle a 200+ file run. #16
Comments
I recently rewrote the The only downside is that you cannot longer know |
Thank you @maxnoe. You are just suggesting we copy this to ctapipe_io_nectarcam, is that right? |
If that would solve your issue, sure, go ahead, you probably only need to adapt the regexes / patterns for the filenames. A more general version of this could be in the common event source, but at least in the next one or two weeks I probably don't have time to start that. |
Question: Did this get implemented? On the CC-IN2P3 Jupyter platform, which is limited to 2G memory unless requesting more, if I do a full wildcard regex then the memory explodes. So, I'm doing my own glob for looping over files 4 at a time and then passing with input_filelist. [subsidiary question, how to know how many files to open at each time? Previous EVB was 2 files, I think, and it's now 4, I see, with EVBv6]. If this can be fixed instead by using Max's code, I could implement it... unless Luigi prefers to. |
Hi @mdpunch, I will not be able to work on this inn the coming weeks. Please, go ahead and implement it if you want. |
2GB is quite low. You won't do much with it... :-) |
Hi Vincent, I can ask the CC for more, I guess. I have lots more on my PC, but not enough disk space for the other set of runs I'm analyzing, which are the "Throttler" runs where there are a few tens of files. Even with my trick (and probably also with Max's code) I still hit the 2GB limit after 20 files or so (even with deleting objects along the way), so there is maybe some kind of memory leak (though I thought Python was better with that). So, I'll do both things: look into implementing Max's code in NectarCAM, and ask the CC for more memory. BTW, do you know where we can find how many files the data stream is spread between at a time for a given EVB? |
I don't think this is possible. To me the number of data stream is "arbitrary". Nevertheless, as far as I know we had 2 for data before EVBv6 and 4 after (@sizun : is that correct ?) so you can use that. Have you try to get the information one file at a time and to call the garbage collector at the end of each file ? |
|
You can certainly fix that, we don't really need in at the moment so I didn't bother. It would increase the startup-time by quite a bit, but you can build a list of files that match the given options in You could also do that lazily, only if someone actually asks for |
Dirk tells me that the number of streams is indeed arbitrary, and he won't bother to add an element to the header saying which number to use, because the whole thing will be deprecated when the ADH replaces it (which I guess is "soon" but on CTAO time-scales).
I'll look into that.
Indeed, I could read them, get the times, and then do a big ol' sort. But I was trying to do things properly (but as Voltaire said, "Il meglio è l'inimico del bene"). Anyway, I have now 16GB memory on the CC Jupyter hub, so it's no longer a problem, but I will still look at making it a little bit more automatic. |
Crash caused by memory comsumption.
The text was updated successfully, but these errors were encountered: