Skip to content

Commit

Permalink
Adds PySCENIC AUCell Binarize (#329)
Browse files Browse the repository at this point in the history
* Binarize passing tests

* Missing test data

* Better naming, description and docs
  • Loading branch information
pcm32 authored Sep 15, 2024
1 parent 54818da commit 6f7bc53
Show file tree
Hide file tree
Showing 3 changed files with 5,132 additions and 0 deletions.
50 changes: 50 additions & 0 deletions tools/tertiary-analysis/pyscenic/pyscenic_binarize_aucell.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import argparse

import pandas as pd
from pyscenic.binarization import binarize

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Binarize AUC matrix")
parser.add_argument("input_file", help="Input TSV or CSV file")
parser.add_argument(
"--threshold-overrides",
type=str,
help="Threshold overrides in JSON format",
)
parser.add_argument("--seed", type=int, default=None, help="Random seed")
parser.add_argument(
"--num-workers", type=int, default=1, help="Number of workers"
)
parser.add_argument(
"--output-prefix", type=str, default="output", help="Output prefix"
)

args = parser.parse_args()

# Read input file
if args.input_file.endswith(".tsv"):
auc_mtx = pd.read_csv(args.input_file, sep="\t", index_col=0)
elif args.input_file.endswith(".csv"):
auc_mtx = pd.read_csv(args.input_file, index_col=0)
else:
raise ValueError("Input file must be a TSV or CSV file")

auc_mtx.apply(pd.to_numeric)
# Parse threshold overrides
threshold_overrides = None
if args.threshold_overrides:
import json

threshold_overrides = json.loads(args.threshold_overrides)

# Call binarize function
binarized_mtx, thresholds = binarize(
auc_mtx, threshold_overrides, args.seed, args.num_workers
)

# set column name for thresholds
thresholds.rename("threshold", inplace=True)

# Save output files
binarized_mtx.to_csv(f"{args.output_prefix}/binarized_mtx.tsv", sep="\t")
thresholds.to_csv(f"{args.output_prefix}/thresholds.tsv", sep="\t")
81 changes: 81 additions & 0 deletions tools/tertiary-analysis/pyscenic/pyscenic_binarize_aucell.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
<tool id="pyscenic_binarize" name="PySCENIC Binarize AUCell" profile="21.09" version="@TOOL_VERSION@+galaxy0">
<description>defines AUCell thresholds and tags cells as passing it or not</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="requirements"/>
<command detect_errors="exit_code">
<![CDATA[
ln -s $input_file aucell.tsv &&
python '$__tool_directory__/pyscenic_binarize_aucell.py'
#if $threshold_overrides
--threshold-overrides '$threshold_overrides'
#end if
#if $seed
--seed '$seed'
#end if
--num-workers \${GALAXY_SLOTS:-1}
--output-prefix ./
aucell.tsv
]]>
</command>
<inputs>
<param name="input_file" type="data" format="tabular,txt" label="Input AUC matrix"/>
<param name="threshold_overrides" type="text" optional="true" label="Threshold overrides in JSON format" help="Override default threshold values for binarization"/>
<param name="seed" type="integer" optional="true" label="Random seed"/>
</inputs>
<outputs>
<data name="binarized_mtx" format="tsv" label="${tool.name} on ${on_string}: Binarized AUC matrix" from_work_dir="binarized_mtx.tsv"/>
<data name="thresholds" format="tsv" label="${tool.name} on ${on_string}: Binarization thresholds" from_work_dir="thresholds.tsv"/>
</outputs>
<tests>
<test>
<param name="input_file" value="aucell_test_smaller.tsv"/>
<param name="seed" value="10"/>
<output name="binarized_mtx">
<assert_contents>
<has_text_matching expression="D124DE\t0\t1"/>
<has_n_lines n="5001"/>
</assert_contents>
</output>
<output name="thresholds">
<assert_contents>
<has_text text="AUCell_1"/>
<has_n_lines n="3"/>
</assert_contents>
</output>
</test>
</tests>
<help>
<![CDATA[
**What it does**
This tool binarizes an AUC matrix using the `binarize` function from the `pySCENIC` package.
It produces a classification of the cell per given AUCell into passing or not passing an
automatically defined threshold, through binarization (when possible) of the AUCell distribution.
See the SCENIC paper or the PySCENIC notebooks for more details.
**Input**
- Input AUC matrix (TSV): The AUC matrix to be binarized (first column expected to be cell identifiers).
- Threshold overrides in JSON format (optional): Override default threshold values for binarization.
- Random seed (optional): Seed for random number generation.
**Output**
- Binarized AUC matrix: The binarized AUC matrix.
- Binarization thresholds: The threshold values used for binarization.
**Example**
Input AUC matrix (tsv file, as generated by the PySCENIC AUCell or Decoupler AUCell tools):
```
AUCellA AUCellB AUCellC
cell1 0.024 0.045 0.001
cell2 0.136 0.024 0.045
cellN 0.001 0.347 0.136
```
]]>
</help>
<expand macro="citations"/>
</tool>
Loading

0 comments on commit 6f7bc53

Please sign in to comment.