You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for developing this great software. I have a question regarding the memory size requirement. I have un-paired snATAC-seq and snRNA-seq data. Peaks from snATAC were called with ArchR. I successfully generated a cistopic object as well as a scenic plus object with meta-cells from the two modality. I ended up with a scenic plus object with 2000 cell X 35K genes and 2000 cell X 500K peaks and a 32-topic model. Since ray parallel computation seems not working well in our cluster server, I only used 1 cpu to run that. But I got stuck in GSEA:
"2023-05-08 12:19:37,017 GSEA INFO Thresholding region to gene relationships
2023-05-08 20:27:34,733 GSEA INFO Subsetting TF2G adjacencies for TF with motif.
2023-05-08 20:29:01,421 GSEA INFO Running GSEA..."
The job cannot run through and seems to require hundreds GB of memory, which made the server node down.
I saw the previous discussion that you suggest to filter genes to reduce the number of genes. I wonder for the peaks, should I also reduce peak number? What is the best practice to reduce peak number?
Also do you know if I install the development branch of SCENIC+ with joblib and increase cpu number, would it possible to run through a scenic plus object with 2000 cell X 25K genes and 2000 cell X 650K peaks ? Will the memory requirement be spread among multiple cpus and make each cpu have relatively small memory load?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
Thanks for developing this great software. I have a question regarding the memory size requirement. I have un-paired snATAC-seq and snRNA-seq data. Peaks from snATAC were called with ArchR. I successfully generated a cistopic object as well as a scenic plus object with meta-cells from the two modality. I ended up with a scenic plus object with 2000 cell X 35K genes and 2000 cell X 500K peaks and a 32-topic model. Since ray parallel computation seems not working well in our cluster server, I only used 1 cpu to run that. But I got stuck in GSEA:
"2023-05-08 12:19:37,017 GSEA INFO Thresholding region to gene relationships
2023-05-08 20:27:34,733 GSEA INFO Subsetting TF2G adjacencies for TF with motif.
2023-05-08 20:29:01,421 GSEA INFO Running GSEA..."
The job cannot run through and seems to require hundreds GB of memory, which made the server node down.
I saw the previous discussion that you suggest to filter genes to reduce the number of genes. I wonder for the peaks, should I also reduce peak number? What is the best practice to reduce peak number?
Also do you know if I install the development branch of SCENIC+ with joblib and increase cpu number, would it possible to run through a scenic plus object with 2000 cell X 25K genes and 2000 cell X 650K peaks ? Will the memory requirement be spread among multiple cpus and make each cpu have relatively small memory load?
Thank you very much!
Beta Was this translation helpful? Give feedback.
All reactions