What is the time complexity of SCENIC+? #321
-
Hello everyone, I am currently running the SCENIC+ data processing pipeline on a small dataset of 3k cells, and this process took several hours. I am curious about the time complexity of SCENIC+ and whether it is linear. I wonder how long it would take to run this algorithm on a dataset with 5M cells. Thank you for your help! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
The version of SCENIC+ in the development branch is a lot faster, please read: #202 (comment) for more information. I'm not sure how long it will take on a dataset of 5M cells, we have not tested this. We did do some timing analysis in our publication: https://www.nature.com/articles/s41592-023-01938-4 see extended data fig 3. I hope this helps, All the best Seppe |
Beta Was this translation helpful? Give feedback.
-
Hi @WhenMelancholy @SeppeDeWinter, I also have a very large dataset, and in testing the whole pipeline, I found that it's not just the Run_SCENIC+ process that consumes time, but also the pycisTopic processing of the ATAC data that consumes a lot of time when performing topic modeling calculations - it takes six or seven hours to model a single topic quantity for roughly 100k cells (mallet modeling and loading each (mallet modeling and loading each takes half the time, which I think is slow), and different numbers of themes need to be tried at a time. I think 5M cells for scenic+ is also a bit of a challenge when running pycisTopic, and for now I'm just going to have to downsample to speed up the computation. It looks like the snakemake pipeline used cisTopic to produce a change in the results, I wonder if the new pipeline will speed up the topic calculations? |
Beta Was this translation helpful? Give feedback.
Hi @WhenMelancholy
The version of SCENIC+ in the development branch is a lot faster, please read: #202 (comment) for more information.
I'm not sure how long it will take on a dataset of 5M cells, we have not tested this.
We did do some timing analysis in our publication: https://www.nature.com/articles/s41592-023-01938-4
see extended data fig 3.
I hope this helps,
All the best
Seppe