-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InvalidStateError while estimating sparsity #3486
Comments
I don't think any of us are using 3.9 at this point. But this is good to know. Our test suite is on 3.9 and doesn't have a problem. I'm not sure. I think we need to ping @alejoe91 and @samuelgarcia to take a look at this. What |
Hello! Thanks for the response! I am using n_jobs -1, which should be 10 cores on the cluster I'm using. I'll try with n_jobs = 1 and get back to you! |
Thanks let us know with the
|
Hello! Setting n_jobs = 1 indeed let me get through the sparsity estimation without error! Naturally this takes much longer to do, though. Since you have implied I might be able to solve my problem by updating from python 3.9 I'll maybe give that a shot next. It's been a while since I chose my version, but I think having 3.9 isn't critical at this stage of the pipeline. Thanks for your help! I'll let you know if changing versions doesn't solve it for me; let me know if you need any more information from me. Jeffrey Boucher |
Yeah it would be great if you could test python 3.10 or 3.11. There have been some improvements in multiprocessing at the python level. If updating python works it tells us that 3.9 might not be as well supported as we thought for our multiprocessing. If 3.10/3.11/3.12 doesn't work then it might be a problem in our multiprocessing itself. |
Hello! Unfortunately, I still got the same error with python 3.11! :estimate_sparsity: 70%|███████ | 7983/11378 [3:11:25<36:45, 1.54it/s] Traceback (most recent call last): Any advice on what to try next? Anything you might also want to look at? Thanks! Jeff Boucher |
Thanks for that info! A few more background questions then: What OS are you using (looks like linux maybe, which flavor?)? Is this on a server or locally? If on a server what is your local OS that you are communicating with the server with? Could you do a conda list or pip list of version numbers for your packages in the environment? Could you give us the stats on your recording object? If you just type recording into your terminal the repr should tell us file size/dtype/number of samples? |
Hello! I am indeed using linux. Here is the output of cat /etc/os-release: " REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" " This is on a server; it's a cluster organized by the university I work at (myriad at UCL). Because of this, when I run the spikesorter I am interfacing with a job submission scheduler. My local OS is linux as well, and is Ubuntu 22.04. Here is the output of the conda list: " " I'll get you the recording stats momentarily... |
The recording I am currently working with outputs: " It's a set of concatenated recordings taken over a period of about a week and a half. Thanks for your help! Jeff Boucher |
Could you try running just one of the recordings and see if that works with @h-mayorquin do you remember this too? That giant concatenations were causing problems with multiprocessing? the issue with this is the best way for us to fix this is to have the data to try to fix it with but sharing ~250 GB is a non-trivial thing :) Maybe @samuelgarcia or @alejoe91 also have opinions about why multiprocessing is failing with concatenation (and they both use linux!). |
Salut. |
Hello! I'll run a single session dataset overnight tonight. We are not using slurm; the cluster seems to be using "SGE 8.1.9", which stands for "Sun Grid Engine". I don't know if there would be a similar problem with this; I'll try to do the single session dataset first |
Hello! In fact I ran into a bug which I think is on my end; I'm going to de-prioritize this for a bit, since I was able to get it working by turning off parallel processing I want to get that started on my real dataset, but afterward I'll get back to this (within a week) Thanks for your help! Jeff Boucher |
Maybe SGE is killing your job because using too much ram. Could you increase the mem when submiting the job ? |
Hello Parallel processing worked fine for a single session; for that and other reasons, I think that the suggestion to request more RAM for my jobs is a good one. I'll try it! Thanks, Jeff |
Hello all!
I've been trying to run kilosort 3 on a concatenated Neuropixels 2 dataset. Lately I've been running into an issue with create_sorting_analyzer, while it is estimating sparsity. Basically, the code is able to run about 70-80% (not any specific number) of the way through, then I get an exception "InvalidStateError". I guess this means that some aspect of my data doesn't work well with the sparsity-estimating algorithm, but I have no guess what that would be.
Here is an example of the exception:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/sjjgjbo/.conda/envs/neurovis_try2/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/sjjgjbo/.conda/envs/neurovis_try2/lib/python3.9/concurrent/futures/process.py", line 323, in run
self.terminate_broken(cause)
File "/home/sjjgjbo/.conda/envs/neurovis_try2/lib/python3.9/concurrent/futures/process.py", line 458, in terminate_broken
work_item.future.set_exception(bpe)
File "/home/sjjgjbo/.conda/envs/neurovis_try2/lib/python3.9/concurrent/futures/_base.py", line 549, in set_exception
raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x2aaf7896ea00 state=cancelled>
Additionally, here are the inputs into create_sorting_analyzer:
we = si.create_sorting_analyzer(recording=rec, sorting=sorting, folder=outDir / 'sortings_folder',
format="binary_folder",
sparse=True
)
Please help me if you can, I would be very greatful, it's been confounding. Let me know if I can offer any additional information that would help!
Thanks,
Jeff Boucher
The text was updated successfully, but these errors were encountered: