Tutorial for scATAC, when you also have scRNA-seq data but not in multiomics #87

MariaRosariaNucera · 2023-07-21T01:29:55Z

MariaRosariaNucera
Jul 21, 2023

Hi, thank you for the wonderful tool.
I am using it, I have scATAC and scRNA-seq data but not on the same cell. I have pre-processed different samples with Cell Ranger Atac and Cell Ranger, respectively.

I am not sure about how to modify the first part of the tutorial to accomodate for the lack of cell metadata annotation.
I report what is explained in the documentation:

In case of independent scATAC-seq data, the cell annotation can also be obtained from alternative methods, such as unnanotated/preliminary clustering analysis (using predefined regions, for example SCREEN for mouse and human). In the later case, you can skip this section and use bulk regions as input to the QC step.

However, it is not totally clear to me how to proceed. Would it be possible to explain it further in the tutorial?

Do I just sostituite the "consensus_regions.bed", in this part:

path_to_regions= {'10x_no_perm': outDir + 'consensus_peak_calling/consensus_regions.bed

with the "peaks.bed" file obtained as output of Cell Ranger ATAC?

Would it make sense to analyse the scATAC data with some other standard pipeline (es the Signac one), integrate with scRNA-seq, get the barcode-celltype metadata and then on top of that run the pycisTopic pipeline as in the standard multiomic tutorial and then transferring the labels again from scRNA-seq?
I am not sure if doing pseudobulk with celltypes obtained in this way would make sense though as it seems to re-do the analysis twice.

I am happy to just use the peaks.bed as consesus_regions.bed instead, I am just not sure if it would be better to integrate the different samples somehow, but maybe this can just be done afterwards.

Thank you very much in advance.

Maria

SeppeDeWinter · 2023-08-18T07:07:49Z

SeppeDeWinter
Aug 18, 2023
Maintainer

Hi @MariaRosariaNucera

Sorry for the late reply.
I moved your question to the discussion section.

We usually run pycistopic twice in this case.

The first time using a predefined set of regions (usually the screen regions from ENCODE https://screen.encodeproject.org/ but you can also use the regions obtained from cellranger). Using this run you can cluster your scATAC-seq cells. Based on these clusters we call consensus peaks.

To increase the resolution for rare populations of cell types (maybe not well represented by screen regions/regions from cellranger) we rerun pycistopic, now using the consensus peaks called in the previous step.

You can also preprocess the data using another tool, like you suggested, and use this to call consensus peaks.

Best,

Seppe

2 replies

MariaRosariaNucera Aug 25, 2023
Author

Thank you very much for your answer!
To be honest I am not sure to have understood the first part ( the generation of the consensus peaks using starting from the cellranger output). The second part should be the "standard" workflow.
I understand that you run the analysis the first time skipping the pseudobulk step,then after the analysis you obtain the clusters and finally you use that information to generate the pseudo bulk in the second analysis.
This was also how I was planning to proceed.
My doubt however is how to generate the consensus peaks for the first analysis.
I am using the following code:

`
gr1 = pr.read_bed("/CellRangerATACdir/outs/filtered_peak_bc_matrix/peaks.bed"
....
grN = pr.read_bed("/CellRangerATACdir/outs/filtered_peak_bc_matrix/peaks.bed")

consensus= pr.PyRanges(pd.concat([gr1.df, ... , grN.df] )).merge()
consensus.to_bed('consensus.bed')
cistopic_obj_list=[create_cistopic_object_from_fragments(path_to_fragments=fragments_dict[key],
path_to_regions="consensus.bed", # the same for all of them or they will cluster x sample
path_to_blacklist=path_to_blacklist,
metrics=metadata_bc[key],
valid_bc=bc_passing_filters[key],
n_cpu=1,
project=key) for key in fragments_dict.keys()]
`
then arrive to the clustering step and start the second analysis.
I am not sure if it is correct, is this the same way you proceed for the first analysis?
or maybe you suggest using this option to start as in the tutorial:

'
cistopic_obj = create_cistopic_object(fragment_matrix=count_matrix, path_to_blacklist=path_to_blacklist)
#where as count_matrix I take the matrix output of cellranger aggregate
'
and then run it the second time.
Thank you again

Maria

SeppeDeWinter Aug 28, 2023
Maintainer

Hi @MariaRosariaNucera

The two solutions you are proposing are both valid. The first solution will re-generate the count matrix that was already generated by cellranger, so this step is a bit redundant if you use the peaks provided by the cellranger output.

Best,

Seppe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial for scATAC, when you also have scRNA-seq data but not in multiomics #87

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Tutorial for scATAC, when you also have scRNA-seq data but not in multiomics #87

MariaRosariaNucera Jul 21, 2023

path_to_regions= {'10x_no_perm': outDir + 'consensus_peak_calling/consensus_regions.bed

Replies: 1 comment · 2 replies

SeppeDeWinter Aug 18, 2023 Maintainer

MariaRosariaNucera Aug 25, 2023 Author

SeppeDeWinter Aug 28, 2023 Maintainer

MariaRosariaNucera
Jul 21, 2023

Replies: 1 comment 2 replies

SeppeDeWinter
Aug 18, 2023
Maintainer

MariaRosariaNucera Aug 25, 2023
Author

SeppeDeWinter Aug 28, 2023
Maintainer