-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subset regions from cistromes params and fix dotplot repressors #337
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Danie
Thank you for the PR.
Could you have a look at the comments I made.
Best,
Seppe
src/scenicplus/cli/commands.py
Outdated
multiome_mudata_fname: pathlib.Path, | ||
out_file_direct_annotation: pathlib.Path, | ||
out_file_extended_annotation: pathlib.Path, | ||
out_file_tf_names: pathlib.Path, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason why you added blank lines between the parameters? Otherwise I would delete them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No no reason. I will delete them
src/scenicplus/cli/commands.py
Outdated
adata_direct_cistromes, adata_extended_cistromes = get_and_merge_cistromes( | ||
paths_to_motif_enrichment_results=paths_to_motif_enrichment_results, | ||
scplus_regions=set(mdata['scATAC'].var_names), | ||
scplus_regions=regions, | ||
subset_regions = regions_to_overlap, #could be the set of all ATAC regions or a subset if subset is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this function call within the if statement block.
if path_to_regions_to_subset is not None:
[...]
adata_direct_cistromes, adata_extended_cistromes = get_and_merge_cistromes(
[...],
subset_regions = regions_to_overlap
)
else:
adata_direct_cistromes, adata_extended_cistromes = get_and_merge_cistromes(
[...],
)
src/scenicplus/cli/commands.py
Outdated
log.info("Reading multiome MuData.") | ||
mdata = mudata.read(multiome_mudata_fname.__str__()) | ||
log.info("Getting cistromes.") | ||
regions = set(mdata['scATAC'].var_names) | ||
regions_to_overlap = set(mdata['scATAC'].var_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this within the if statement, see next comment as well.
src/scenicplus/cli/scenicplus.py
Outdated
parser.add_argument( | ||
"--path_to_regions_to_subset", dest="path_to_regions_to_subset", | ||
action="store", type=str, required=False,default ="", | ||
help="Path to bed file for regions to subset when merging cistromes (MACS called peaks).") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can be any bed file right, don't necessarily need to be peaks called by MACS? If this is the case I would remove "(MACS called peaks)
" from the documentation to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True could be removed. or could add for example. Will fix it
extended = True)) | ||
return cistromes | ||
|
||
def _overlap_if_necessary(s_query_regions, test_regions, regions_to_overlap): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken this function returns either a set (else block) or a pyranges object (if block), while the Cistrome class expects a set for the target_regions
parameter.
Also I would run this function not while providing the arguments to clearly seperate these two things (i.e. 1. initiating the class and 2. putting the subset region in the same coordinate system as SCENIC+).
@@ -124,7 +124,7 @@ def heatmap_dotplot( | |||
# Plotting | |||
plotnine.options.figure_size = figsize | |||
plotting_df["repressor_activator"] = [ | |||
"activator" if "+" in n.split("_")[2] else "repressor" for n in plotting_df[feature_name_key]] | |||
"repressor" if "-" in n.split("_")[2] else "activator" for n in plotting_df[feature_name_key]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, that was an obvious mistake from my part. Thanks!
… reading it as None
Added the "path to regions to subset" as a parameter in the snakemake config file. This allows giving the path to the bed file containing the MACS called peaks that one would like to subset when creating cistromes (hence, cell type-specific cistromes). It can also be left empty (empty string) if no regions should be subsetted.
Also, one small change in the regulons dotplot: considered all regulons with "-" as repressors.