Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoupler Pathway Inference #308

Merged
merged 60 commits into from
Mar 15, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
6ce05c3
first commit for decoupler_pathway_inference
anilthanki Nov 3, 2023
1db7706
code refactoring
anilthanki Nov 3, 2023
1b88053
uses progeny network
anilthanki Nov 10, 2023
9022bc9
adds function to download anndata for decoupler_pathway_inference
anilthanki Nov 10, 2023
9adf85e
adds network file and option to use network file
anilthanki Nov 10, 2023
8f68f11
replaces python with wget and zenodo link
anilthanki Nov 10, 2023
118b327
adds use_raw and min_n options
anilthanki Nov 10, 2023
5afc617
removes unnecessary args
anilthanki Nov 17, 2023
b99d9de
adds Galaxy wrapper
anilthanki Nov 17, 2023
7f25d64
updates anndata key
anilthanki Nov 24, 2023
7944106
fixes arg type for min_n and enables use_raw arg
anilthanki Nov 24, 2023
f699633
first commit for decoupler_pathway_inference
anilthanki Nov 3, 2023
02659ce
code refactoring
anilthanki Nov 3, 2023
965753d
uses progeny network
anilthanki Nov 10, 2023
4efaff7
adds function to download anndata for decoupler_pathway_inference
anilthanki Nov 10, 2023
47073ce
adds network file and option to use network file
anilthanki Nov 10, 2023
012cff5
replaces python with wget and zenodo link
anilthanki Nov 10, 2023
a0a0116
adds use_raw and min_n options
anilthanki Nov 10, 2023
3fa15df
removes unnecessary args
anilthanki Nov 17, 2023
a37d403
adds Galaxy wrapper
anilthanki Nov 17, 2023
827cbe8
updates anndata key
anilthanki Nov 24, 2023
fd8f001
fixes arg type for min_n and enables use_raw arg
anilthanki Nov 24, 2023
388b89b
Merge branch 'feature/decoupler_pathway_inference' of https://github.…
anilthanki Nov 24, 2023
8efaba9
fixes params and output name
anilthanki Nov 24, 2023
e34e93d
updates key in anndata
anilthanki Nov 24, 2023
ec2de3f
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Nov 24, 2023
0751eb5
removes hdf5plugin
anilthanki Nov 24, 2023
df5616a
renames arg
anilthanki Nov 24, 2023
e24a71e
shortens args
anilthanki Nov 24, 2023
40fe7c7
adds --output and removes quotes around
anilthanki Nov 29, 2023
8c38a2b
updates test parameter
anilthanki Nov 29, 2023
d8fca4a
updates test parameter
anilthanki Nov 29, 2023
3258a1a
testing raw input
anilthanki Dec 12, 2023
182d056
testing raw input
anilthanki Dec 12, 2023
a1ea2ab
makes output anndata with activities path optional
anilthanki Dec 12, 2023
37d5afa
makes output anndata with activities path optional adds filter in out…
anilthanki Dec 12, 2023
23c67d6
removes param
anilthanki Dec 12, 2023
4891666
removes progeny.tsv
anilthanki Dec 12, 2023
ecb4867
fixes activities output file
anilthanki Dec 12, 2023
6428552
merges mlm files into one
anilthanki Dec 12, 2023
38a657f
merges mlm files into one
anilthanki Dec 13, 2023
ba18fbd
updates number of output in test
anilthanki Dec 13, 2023
f72770c
concat pd before writing to a file
anilthanki Dec 13, 2023
e14317f
updates test verification in wrapper
anilthanki Dec 13, 2023
0f0a4e2
removed test.h5ad and downloads it from zenodo
anilthanki Dec 18, 2023
7b34514
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Feb 13, 2024
b869140
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 7, 2024
d833bdf
Update decoupler_pathway_inference.xml
anilthanki Mar 7, 2024
86a9763
adds ULM method
anilthanki Mar 7, 2024
3f4efd6
adds alternative sample network file
anilthanki Mar 8, 2024
2d76c11
adds unindexed network file and test
anilthanki Mar 8, 2024
b662f08
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
45c6b86
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
3b9ef4a
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
420dd03
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
b0a7cc3
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
9eac4a1
Update tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
anilthanki Mar 12, 2024
9d91561
adds network file parameters
anilthanki Mar 14, 2024
aa07b02
Removes unused argument and comments
anilthanki Mar 14, 2024
9326fae
to re-run test
anilthanki Mar 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions tools/tertiary-analysis/decoupler/decoupler_pathway_inference.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# import the necessary packages
import argparse

import anndata as ad
import decoupler as dc
import pandas as pd
import hdf5plugin

# define arguments for the script
parser = argparse.ArgumentParser()

# add AnnData input file option
parser.add_argument(
"-i", "--input_anndata", help="AnnData input file", required=True
)

# add network input file option
parser.add_argument(
"-n", "--input_network", help="Network input file", required=True
)

# output file prefix
parser.add_argument(
"--output",
help="output files prefix",
default=None,
)

# figure size option
parser.add_argument(
"-f", "--figure_size", help="figure size", default="10,10"
)

# path to save Activities AnnData file
parser.add_argument(
"-a", "--activities_path", help="Path to save Activities AnnData file", default=None
)

# Column name in net with source nodes
parser.add_argument(
"-s", "--source", help="Column name in net with source nodes.", default="source"
)

# Column name in net with target nodes
parser.add_argument(
"-t", "--target", help="Column name in net with target nodes.", default="target"
)

# Column name in net with weights.
parser.add_argument(
"-w", "--weight", help="Column name in net with weights.", default="weight"
)

# add boolean argument for use_raw
parser.add_argument(
"--use_raw", action="store_true", default=False, help="Whether to use the raw part of the AnnData object"
)

# add argument for min_cells
parser.add_argument(
"--min_n", help="Minimum of targets per source. If less, sources are removed.", default=5, type=int
)
args = parser.parse_args()

# check that either -o or --output is specified
if args.output is None:
raise ValueError("Please specify either -o or --output")

# read in the AnnData input file
adata = ad.read_h5ad(args.input_anndata)

# read in the input file network input file
network = pd.read_csv(args.input_network, sep='\t')

if (
args.source not in network.columns
or args.target not in network.columns
or args.weight not in network.columns
):
raise ValueError(
"Source, target, and weight columns are not present in the network"
)


print(type(args.min_n))
dc.run_mlm(
mat=adata,
net=network,
source=args.source,
target=args.target,
weight=args.weight,
verbose=True,
min_n=args.min_n,
use_raw=args.use_raw #Failing at the moment
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
)

if args.output is not None:
# write adata.obsm[ulm_key] and adata.obsm[ulm_pvals_key] to the output network files
adata.obsm["mlm_estimate"].to_csv(args.output + "_mlm.tsv", sep="\t")
adata.obsm["mlm_pvals"].to_csv(args.output + "_mlm_pvals.tsv", sep="\t")
anilthanki marked this conversation as resolved.
Show resolved Hide resolved

# if args.activities_path is specified, generate the activities AnnData and save the AnnData object to the specified path
if args.activities_path is not None:
acts = dc.get_acts(adata, obsm_key="mlm_estimate")
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
adata.write_h5ad(args.activities_path)
90 changes: 90 additions & 0 deletions tools/tertiary-analysis/decoupler/decoupler_pathway_inference.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
<tool id="decoupler_pathway_inference" name="Decoupler Pathway Inference" version="1.4.0+galaxy0" profile="20.05" license="MIT">
<description>
Pathway inference using the Decoupler.
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
</description>
<requirements>
<requirement type="package" version="1.4.0">decoupler</requirement>
</requirements>
<command>
python '$__tool_directory__/decoupler_pathway_inference.py'
--input_file '$input_file'
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
--input_network '$input_network_file'
--min_n '$min_n'
'$use_raw'
--output "inference"
--activities_path anndata_activities_path.h5ad
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
</command>
<inputs>
<param name="input_anndata" type="data" format="h5ad" label="Input AnnData file" />
<param name="input_network_file" type="data" format="tabular" label="Input Network file" />
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
<param name="min_n" type="integer" min="0" value="5" label="Minimum of targets per source. If less, sources are removed" />
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
<param name="use_raw" type="boolean" truevalue="--use_raw" falsevalue="" checked="false" label="Whether to use the raw part of the AnnData object" />
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
<param name="write_inference" type="boolean" truevalue="--write_inference" falsevalue="" checked="true" label="Write the inference TSV files" />
<param name="write_activities_path" type="boolean" truevalue="--write_activities_path" falsevalue="" checked="true" label="Write the modified AnnData object" />
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
</inputs>
<outputs>
<data name="output_ad" format="h5ad" from_work_dir="anndata_activities_path.h5ad" label="${tool.name} on ${on_string}: Output AnnData file" />
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
<data name="output_table_mlm_estimate" format="tabular" from_work_dir="inference_mlm.tsv" label="${tool.name} on ${on_string}: Output MLM estimate table" />
<data name="output_table_mlm_pvalue" format="tabular" from_work_dir="inference_mlm_pvals.tsv" label="${tool.name} on ${on_string}: Output MLM p-values table" />
anilthanki marked this conversation as resolved.
Show resolved Hide resolved
</outputs>
<tests>
<!-- Hint: You can use [ctrl+alt+t] after defining the inputs/outputs to auto-scaffold some basic test cases. -->
anilthanki marked this conversation as resolved.
Show resolved Hide resolved

<test expect_num_outputs="3">
<param name="input_anndata" value="pbmc3k_processed.h5ad"/>
<param name="input_network_file" value="progeny_test.tsv"/>
<param name="min_n" value="5"/>
<param name="use_raw" value="false"/>
<param name="write_inference" value="true"/>
<param name="write_activities_path" value="true"/>
<output name="output_ad">
<assert_contents>
<has_h5_keys keys="obs/mlm_estimate"/>
</assert_contents>
</output>
<output name="output_table_mlm_estimate">
<assert_contents>
<has_n_columns n="5"/>
</assert_contents>
</output>
<output name="output_table_mlm_pvalue">
<assert_contents>
<has_n_columns n="5"/>
</assert_contents>
</output>
</test>
</tests>
<help>
**What it does**

Usage
.....


**Description**

This tool scores cells using the AUCell method for gene sets.
anilthanki marked this conversation as resolved.
Show resolved Hide resolved

**Input**

The input file should be an AnnData object in H5AD format. The tool accepts an H5AD file containing raw or normalized data.

The tool also takes network file containing a collection of pathways and their target genes, with weights for each interaction

You can also specify whether to use the raw data in the AnnData object instead of the X matrix using the "use_raw" parameter and Minimum of targets per source using "min_n".


**Output**

The tool outputs an AnnData object containing the scores in the "obs" field, and tab-separated text files containing the scores for each cell.

If the "write_activities_path" parameter is set to "true", the tool will write the modified AnnData object to an H5AD file.
If the "write_inference" parameter is set to "true", the tool will output a tab-separated text file containing the scores for each cell.



</help>
<citations>
<citation type="doi">10.1093/bioadv/vbac016 </citation>
</citations>
</tool>
7 changes: 7 additions & 0 deletions tools/tertiary-analysis/decoupler/get_test_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,10 @@ function get_data {
mkdir -p test-data
pushd test-data
get_data $MTX_LINK $BASENAME_FILE


BASENAME_FILE='pbmc3k_processed.h5ad'

MTX_LINK='https://zenodo.org/records/3752813/files/pbmc3k_processed.h5ad'

get_data $MTX_LINK $BASENAME_FILE
Loading
Loading