Why psuedobulk then QC #116
Replies: 4 comments 7 replies
-
Please find my answers to your questions below. Q1: We do the pseudobulking before QC because we make use of the consensus peaks generated after the pseudobulking step. For example to be able to calculate the fraction of reads in peaks (FRIP). Q2: That's correct! Q3: I'm not too familiar with that specific function. But from a quick glance it looks fairly similar. I hope this helps. All the best, Seppe |
Beta Was this translation helpful? Give feedback.
-
Hi, Many thanks for your reply. I am actually a bit confused when integrating results (anndata) from snapATAC2 to pycisTopic. I was gonna to use pycisTopic and then to do the downstream analysis with SCENICplus, but with Rbased cisTopic, the input files are:
with python based cisTopic, I first need to have pseudobulk profiles from cell annotations. However, for now I only have the snapATAC-seq fragments.tsv.gz files without the cell annotations profiles. I can not get my hands on snATAC analysis with cistopic from what I understood. So I preprocessed the data on snapATAC2, and intended to perform the downstream analysis with SCENICplus. Now I have the QC-processed snATAC-seq anndata, with the concensus peaks, and the leiden clustering, the bigwig files and coverage files for each cluster can also be exported. However, I don't know how to integrate with the SCENICplus or pyCistopic. If I integrate with the pyCistopic, the annotation to generate the pseudobulk profiles will come from the leiden clustering, and I don't see the reason to perform LDA. As from what I understood, cisTopic is to use LDA probabilistic model to conduct the dimension reduction(topics) and snapATAC2 is to use Laplacian eigenmap to perform dimension reduciton. I don't see the rationale to first DR with the snapATAC2 and then perform LDA with cisTopic. AnnDataSet object with n_obs x n_vars = 39252 x 6062095 backed at '/lustre1/project/stg_00079/students/tingting/data/sun/snap2_allfragments/02/microglia_1.h5ads'
contains 293 AnnData objects with keys:
obs: 'sample', 'region', 'subject', 'ad', 'leiden_0.5', 'leiden_mnc_0.5', 'leiden_harmony_0.5'
var: 'count', 'selected'
uns: 'AnnDataSet', 'num_eigen', 'macs3', 'reference_sequences', 'spectral_eigenvalue'
obsm: 'X_umap', 'X_spectral', 'X_spectral_harmony_sample', 'X_spectral_mnc_sample', 'X_umap_harmony_sample', 'X_umap_mnc_sample'
obsp: 'distances' Now my main questions are
Thanks |
Beta Was this translation helpful? Give feedback.
-
Thanks very much for your reply. Quick try with your solution: I tried the function dat=anndata.read_h5ad("microglia.h5ad")
dat
AnnData object with n_obs × n_vars = 39252 × 6062095
obs: 'sample', 'tsse'
var: 'count', 'selected'
uns: 'AnnDataSet', 'num_eigen', 'reference_sequences', 'spectral_eigenvalue'
obsm: 'X_spectral', 'X_spectral_harmony', 'X_spectral_mnc', 'X_umap', 'X_umap_harmony', 'X_umap_mnc'
dat.X
<39252x6062095 sparse matrix of type '<class 'numpy.uint32'>'
with 349387777 stored elements in Compressed Sparse Row format> error: cistopic_obj = create_cistopic_object(
fragment_matrix = dat.X,
cell_names = dat.obs_names,
region_names = dat.var_names
)
error: 39252 columns passed, passed data had 6062095 columns
cistopic_obj = create_cistopic_object(
fragment_matrix = dat.X.T,
cell_names = dat.obs_names,
region_names = dat.var_names
)
error: ValueError: setting an array element with a sequence.
cistopic_obj = create_cistopic_object(
fragment_matrix = dat.X[ :, dat.var['selected']].T,
cell_names = dat.obs_names.tolist(),
region_names = dat.var['selected'][dat.var['selected'] == True].index.tolist()
)
ValueError: setting an array element with a sequence. Any idea on the reason? Thanks |
Beta Was this translation helpful? Give feedback.
-
I'm gonna have separate snRNA-seq data but not multiomics for these microglia cells, it is gonna be challenging I suppose, but I'm have a try afterwards. If that doesn't work, I also have a small set of cells coming from multiomics, so that I can analyze with SCENICplus. The thing is for now I only have the fragments.tsv.gz files for snATAC-seq data. I analyzed the data with the snapATAC2, but since the data are the same cell type microglia cells from participants with different disease status, I am afraid that the subtle change was not captured in snapATAC2 analysis. So I want to analyze these with cisTopic which can also help me with the later SCENICplus analysis. Thanks |
Beta Was this translation helpful? Give feedback.
-
Dear,
1.Q
Could I please ask why the pseudobulk ATAC-seq profiles per cell type are first generated then conduct the QC? it is kinda more understandable to me that QC is first conducted on the sample level then pseudobulk ATAC-seq.
2.Q
the cell annotation can also be obtained from alternative methods, such as a preliminary clustering analysis using a predefined set of genome-wide regions/peaks (e.g. SCREEN) as input to identify cell populations.
--from introduction of the tutorialDoes it mean that the cell annotation can be the ATAC seq clusters defined with clustering methods.
I have the snATAC-seq QC processed with snapATAC2 and post-processed with Leiden clustering. If I am correct, I can convert processed annData to loom -> pseudobulk peak calling -> regions&bigwig -> fragments matrix -> LDA.....
3.Q
To derive a set of consensus peaks, we use the iterative overlap peak merging procedure describe in Corces et al. (2018). First, each summit is extended a peak_half_width (by default, 250bp) in each direction and then we iteratively filter out less significant peaks that overlap with a more significant one.
If I understand it correctly, is this procedure similar to the one in snapATAC2 function: merge peaks?
Looking forward to your reply!
Thanks!
tingting
Beta Was this translation helpful? Give feedback.
All reactions