Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoupler Pathway Inference #308

Merged
merged 60 commits into from
Mar 15, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
6ce05c3
first commit for decoupler_pathway_inference
anilthanki Nov 3, 2023
1db7706
code refactoring
anilthanki Nov 3, 2023
1b88053
uses progeny network
anilthanki Nov 10, 2023
9022bc9
adds function to download anndata for decoupler_pathway_inference
anilthanki Nov 10, 2023
9adf85e
adds network file and option to use network file
anilthanki Nov 10, 2023
8f68f11
replaces python with wget and zenodo link
anilthanki Nov 10, 2023
118b327
adds use_raw and min_n options
anilthanki Nov 10, 2023
5afc617
removes unnecessary args
anilthanki Nov 17, 2023
b99d9de
adds Galaxy wrapper
anilthanki Nov 17, 2023
7f25d64
updates anndata key
anilthanki Nov 24, 2023
7944106
fixes arg type for min_n and enables use_raw arg
anilthanki Nov 24, 2023
f699633
first commit for decoupler_pathway_inference
anilthanki Nov 3, 2023
02659ce
code refactoring
anilthanki Nov 3, 2023
965753d
uses progeny network
anilthanki Nov 10, 2023
4efaff7
adds function to download anndata for decoupler_pathway_inference
anilthanki Nov 10, 2023
47073ce
adds network file and option to use network file
anilthanki Nov 10, 2023
012cff5
replaces python with wget and zenodo link
anilthanki Nov 10, 2023
a0a0116
adds use_raw and min_n options
anilthanki Nov 10, 2023
3fa15df
removes unnecessary args
anilthanki Nov 17, 2023
a37d403
adds Galaxy wrapper
anilthanki Nov 17, 2023
827cbe8
updates anndata key
anilthanki Nov 24, 2023
fd8f001
fixes arg type for min_n and enables use_raw arg
anilthanki Nov 24, 2023
388b89b
Merge branch 'feature/decoupler_pathway_inference' of https://github.…
anilthanki Nov 24, 2023
8efaba9
fixes params and output name
anilthanki Nov 24, 2023
e34e93d
updates key in anndata
anilthanki Nov 24, 2023
ec2de3f
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Nov 24, 2023
0751eb5
removes hdf5plugin
anilthanki Nov 24, 2023
df5616a
renames arg
anilthanki Nov 24, 2023
e24a71e
shortens args
anilthanki Nov 24, 2023
40fe7c7
adds --output and removes quotes around
anilthanki Nov 29, 2023
8c38a2b
updates test parameter
anilthanki Nov 29, 2023
d8fca4a
updates test parameter
anilthanki Nov 29, 2023
3258a1a
testing raw input
anilthanki Dec 12, 2023
182d056
testing raw input
anilthanki Dec 12, 2023
a1ea2ab
makes output anndata with activities path optional
anilthanki Dec 12, 2023
37d5afa
makes output anndata with activities path optional adds filter in out…
anilthanki Dec 12, 2023
23c67d6
removes param
anilthanki Dec 12, 2023
4891666
removes progeny.tsv
anilthanki Dec 12, 2023
ecb4867
fixes activities output file
anilthanki Dec 12, 2023
6428552
merges mlm files into one
anilthanki Dec 12, 2023
38a657f
merges mlm files into one
anilthanki Dec 13, 2023
ba18fbd
updates number of output in test
anilthanki Dec 13, 2023
f72770c
concat pd before writing to a file
anilthanki Dec 13, 2023
e14317f
updates test verification in wrapper
anilthanki Dec 13, 2023
0f0a4e2
removed test.h5ad and downloads it from zenodo
anilthanki Dec 18, 2023
7b34514
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Feb 13, 2024
b869140
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 7, 2024
d833bdf
Update decoupler_pathway_inference.xml
anilthanki Mar 7, 2024
86a9763
adds ULM method
anilthanki Mar 7, 2024
3f4efd6
adds alternative sample network file
anilthanki Mar 8, 2024
2d76c11
adds unindexed network file and test
anilthanki Mar 8, 2024
b662f08
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
45c6b86
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
3b9ef4a
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
420dd03
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
b0a7cc3
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
9eac4a1
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
9d91561
adds network file parameters
anilthanki Mar 14, 2024
aa07b02
Removes unused argument and comments
anilthanki Mar 14, 2024
9326fae
to re-run test
anilthanki Mar 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 50 additions & 19 deletions tools/tertiary-analysis/decoupler/decoupler_pathway_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@
parser.add_argument(
"--min_n", help="Minimum of targets per source. If less, sources are removed.", default=5, type=int
)

# add activity inference method option
parser.add_argument(
"-m", "--method", help="Activity inference method", default="mlm", required=True
)
args = parser.parse_args()

# check that either -o or --output is specified
Expand All @@ -82,25 +87,51 @@


print(type(args.min_n))
dc.run_mlm(
mat=adata,
net=network,
source=args.source,
target=args.target,
weight=args.weight,
verbose=True,
min_n=args.min_n,
use_raw=args.use_raw #Failing at the moment
)

if args.output is not None:
# write adata.obsm[mlm_key] and adata.obsm[mlm_pvals_key] to the output network files
combined_df = pd.concat([adata.obsm["mlm_estimate"], adata.obsm["mlm_pvals"]], axis=1)
if args.method == "mlm":
dc.run_mlm(
mat=adata,
net=network,
source=args.source,
target=args.target,
weight=args.weight,
verbose=True,
min_n=args.min_n,
use_raw=args.use_raw #Failing at the moment
)

if args.output is not None:
# write adata.obsm[mlm_key] and adata.obsm[mlm_pvals_key] to the output network files
combined_df = pd.concat([adata.obsm["mlm_estimate"], adata.obsm["mlm_pvals"]], axis=1)

# Save the combined dataframe to a file
combined_df.to_csv(args.output + ".tsv", sep="\t")

# if args.activities_path is specified, generate the activities AnnData and save the AnnData object to the specified path
if args.activities_path is not None:
acts = dc.get_acts(adata, obsm_key="mlm_estimate")
acts.write_h5ad(args.activities_path)

elif args.method == "ulm":
dc.run_ulm(
mat=adata,
net=network,
source=args.source,
target=args.target,
weight=args.weight,
verbose=True,
min_n=args.min_n,
use_raw=args.use_raw #Failing at the moment
)

if args.output is not None:
# write adata.obsm[mlm_key] and adata.obsm[mlm_pvals_key] to the output network files
combined_df = pd.concat([adata.obsm["ulm_estimate"], adata.obsm["ulm_pvals"]], axis=1)

# Save the combined dataframe to a file
combined_df.to_csv(args.output + "_mlm.tsv", sep="\t")
# Save the combined dataframe to a file
combined_df.to_csv(args.output + ".tsv", sep="\t")

# if args.activities_path is specified, generate the activities AnnData and save the AnnData object to the specified path
if args.activities_path is not None:
acts = dc.get_acts(adata, obsm_key="mlm_estimate")
acts.write_h5ad(args.activities_path)
# if args.activities_path is specified, generate the activities AnnData and save the AnnData object to the specified path
if args.activities_path is not None:
acts = dc.get_acts(adata, obsm_key="ulm_estimate")
acts.write_h5ad(args.activities_path)
49 changes: 41 additions & 8 deletions tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<tool id="decoupler_pathway_inference" name="Decoupler Pathway Inference" version="1.4.0+galaxy0" profile="20.05" license="MIT">
<description>
Pathway inference using the Decoupler.
of functional genesets/pathways based on MLM with scRNA-seq data.
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
</description>
<requirements>
<requirement type="package" version="1.4.0">decoupler</requirement>
Expand All @@ -10,22 +10,27 @@
-i '$input_anndata'
-n '$input_network_file'
--min_n "$min_n"
--method '$method'
$use_raw
--output "inference"
$write_activities_path
</command>
<inputs>
<param name="input_anndata" type="data" format="h5ad" label="Input AnnData file" />
<param name="input_network_file" type="data" format="tabular" label="Input Network file" />
<param name="min_n" type="integer" min="0" value="5" label="Minimum of targets per source. If less, sources are removed" />
<param name="use_raw" type="boolean" truevalue="--use_raw" falsevalue="" checked="false" label="Whether to use the raw part of the AnnData object" />
<param name="input_network_file" type="data" format="tabular" label="Input Network file" help="Network file is tabular file with Source, Target and Weight. Source is a transcription factor(s) with specific regulation to target genes, either positive or negative." />
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
<param name="min_n" type="integer" min="0" value="5" label="Minimum of targets per source." help="If targets are less than minimum, sources are removed" />
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
<param name="method" type="select" label="Activity inference method">
<option value="mlm" selected="true">MLM</option>
<option value="ulm">ULM</option>
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
</param>
<param name="use_raw" type="boolean" truevalue="--use_raw" falsevalue="" checked="false" label="Use the raw part of the AnnData object" />
<param name="write_activities_path" type="boolean" truevalue="--activities_path anndata_activities_path.h5ad" falsevalue="" checked="true" label="Write the modified AnnData object" />
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
</inputs>
<outputs>
<data name="output_ad" format="h5ad" from_work_dir="anndata_activities_path.h5ad" label="${tool.name} on ${on_string}: Output AnnData file">
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
<filter>write_activities_path</filter>
</data>
<data name="output_table_mlm" format="tabular" from_work_dir="inference_mlm.tsv" label="${tool.name} on ${on_string}: Output MLM estimate table" />
<data name="output_table" format="tabular" from_work_dir="inference.tsv" label="${tool.name} on ${on_string}: Output estimate table" />
</outputs>
<tests>
<!-- Hint: You can use [ctrl+alt+t] after defining the inputs/outputs to auto-scaffold some basic test cases. -->
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -34,14 +39,33 @@
<param name="input_anndata" value="pbmc3k_processed.h5ad"/>
<param name="input_network_file" value="progeny_test.tsv"/>
<param name="min_n" value="0"/>
<param name="method" value="mlm"/>
<param name="use_raw" value="false"/>
<param name="write_activities_path" value="true"/>
<output name="output_ad">
<assert_contents>
<has_h5_keys keys="obsm/mlm_estimate"/>
</assert_contents>
</output>
<output name="output_table_mlm">
<output name="output_table">
<assert_contents>
<has_n_columns n="5"/>
</assert_contents>
</output>
</test>
<test>
<param name="input_anndata" value="pbmc3k_processed.h5ad"/>
<param name="input_network_file" value="progeny_test_2.tsv"/>
<param name="min_n" value="0"/>
<param name="method" value="ulm"/>
<param name="use_raw" value="false"/>
<param name="write_activities_path" value="true"/>
<output name="output_ad">
<assert_contents>
<has_h5_keys keys="obsm/ulm_estimate"/>
</assert_contents>
</output>
<output name="output_table">
<assert_contents>
<has_n_columns n="5"/>
</assert_contents>
Expand All @@ -57,13 +81,22 @@ Usage

**Description**

This tool scores cells using the AUCell method for gene sets.
This tool extracts pathway activity inference using decoupler.

**Input**

The input file should be an AnnData object in H5AD format. The tool accepts an H5AD file containing raw or normalized data.

The tool also takes network file containing a collection of pathways and their target genes, with weights for each interaction
The tool also takes network file containing a collection of pathways and their target genes, with weights for each interaction.
Example:
```
source target weight
0 T1 G01 1.0
1 T1 G02 1.0
2 T1 G03 0.7
3 T2 G04 1.0
4 T2 G06 -0.5
```

You can also specify whether to use the raw data in the AnnData object instead of the X matrix using the "use_raw" parameter and Minimum of targets per source using "min_n".

Expand Down
71 changes: 71 additions & 0 deletions tools/tertiary-analysis/decoupler/test-data/progeny_test_2.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
source target weight p_value
Androgen TMPRSS2 11.490631 0.0
Androgen NKX3-1 10.622551 2.2e-44
Androgen MBOAT2 10.472733 4.6e-44
Androgen KLK2 10.176186 1.94441e-40
Androgen SARG 11.386852 2.79021e-40
EGFR LZTFL1 -1.8738769 2.0809955e-18
EGFR PHLDA2 3.5051384 2.0530624e-17
EGFR DUSP6 12.6293125 6.537324e-17
EGFR DUSP5 7.9430394 6.86669e-17
EGFR PHLDA1 6.619626 3.4106933e-16
Estrogen GREB1 17.240173 0.0
Estrogen RET 10.718027 0.0
Estrogen TFF1 14.430255 0.0
Estrogen HEY2 11.482369 3.1e-44
Estrogen RAPGEFL1 10.544896 5.2e-43
Hypoxia FAM162A 8.335551 0.0
Hypoxia NDRG1 22.08712 0.0
Hypoxia ENO2 14.32694 0.0
Hypoxia PDK1 13.120449 0.0
Hypoxia ANKRD37 8.484976 0.0
JAK-STAT OAS1 15.028714 1.058e-41
JAK-STAT HERC6 8.769676 1.3450407e-38
JAK-STAT OAS3 10.618842 1.2143582e-37
JAK-STAT PLSCR1 8.481604 8.955206e-37
JAK-STAT DDX60 12.198234 9.150971e-36
MAPK DUSP6 16.859016 0.0
MAPK SPRED2 3.5018346 0.0
MAPK SPRY2 9.481585 9.19e-43
MAPK ETV5 5.9887094 6.7425e-41
MAPK EPHA2 6.3140125 3.7492e-40
NFkB NFKB1 9.513637 0.0
NFkB CXCL3 22.946114 0.0
NFkB NFKB2 5.5155754 0.0
NFkB NFKBIA 11.444533 0.0
NFkB BCL2A1 14.416924 0.0
PI3K MLANA -9.985743 1.84e-43
PI3K PMEL -6.5903482 6.8747866e-36
PI3K FAXDC2 -12.421274 3.297515e-34
PI3K HSD17B8 -8.601571 9.948224e-34
PI3K CTSF -9.172143 1.0235212e-31
TGFb LINC00312 4.428987 2.0074443e-17
TGFb TSPAN2 5.502326 3.1451768e-16
TGFb SMAD7 7.6311436 7.3087106e-16
TGFb NOX4 5.913813 3.8292238e-15
TGFb COL4A1 6.3374896 9.052501e-15
TNFa CSF2 8.35548 0.0
TNFa CXCL5 10.0813675 0.0
TNFa NFKBIE 10.356205 0.0
TNFa TNFAIP3 35.40072 0.0
TNFa EFNA1 18.63111 0.0
Trail FRMPD1 -2.2346141 9.378505e-07
Trail WT1-AS 2.2251053 2.0316747e-06
Trail WNT8A -1.8469616 3.795469e-05
Trail GPR18 3.240805 6.1090715e-05
Trail TEC 2.0513217 6.32898e-05
VEGF CRACD -4.87119 6.7185365e-25
VEGF VWA8 -3.6068044 1.4495265e-18
VEGF NLGN1 -5.618075 2.6587072e-18
VEGF NRG3 -5.823747 1.0848074e-16
VEGF KCNK10 2.8833063 1.8129868e-16
WNT BMP4 5.936831 2.511717e-10
WNT SIGLEC6 2.0207362 2.347858e-09
WNT NPY2R 1.3872339 8.666917e-09
WNT CSF3R 1.9323153 3.0219417e-07
WNT KRT23 4.1216116 5.463989e-07
p53 GLS2 6.452465 7.444302e-37
p53 MDM2 8.193488 2.1194304e-35
p53 ZNF79 4.020263 4.5987433e-34
p53 FDXR 11.994496 5.589482e-32
p53 LCE1B 11.813737 7.8095406e-30
Loading