-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds PySCENIC AUCell Binarize (#329)
* Binarize passing tests * Missing test data * Better naming, description and docs
- Loading branch information
Showing
3 changed files
with
5,132 additions
and
0 deletions.
There are no files selected for viewing
50 changes: 50 additions & 0 deletions
50
tools/tertiary-analysis/pyscenic/pyscenic_binarize_aucell.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
import argparse | ||
|
||
import pandas as pd | ||
from pyscenic.binarization import binarize | ||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(description="Binarize AUC matrix") | ||
parser.add_argument("input_file", help="Input TSV or CSV file") | ||
parser.add_argument( | ||
"--threshold-overrides", | ||
type=str, | ||
help="Threshold overrides in JSON format", | ||
) | ||
parser.add_argument("--seed", type=int, default=None, help="Random seed") | ||
parser.add_argument( | ||
"--num-workers", type=int, default=1, help="Number of workers" | ||
) | ||
parser.add_argument( | ||
"--output-prefix", type=str, default="output", help="Output prefix" | ||
) | ||
|
||
args = parser.parse_args() | ||
|
||
# Read input file | ||
if args.input_file.endswith(".tsv"): | ||
auc_mtx = pd.read_csv(args.input_file, sep="\t", index_col=0) | ||
elif args.input_file.endswith(".csv"): | ||
auc_mtx = pd.read_csv(args.input_file, index_col=0) | ||
else: | ||
raise ValueError("Input file must be a TSV or CSV file") | ||
|
||
auc_mtx.apply(pd.to_numeric) | ||
# Parse threshold overrides | ||
threshold_overrides = None | ||
if args.threshold_overrides: | ||
import json | ||
|
||
threshold_overrides = json.loads(args.threshold_overrides) | ||
|
||
# Call binarize function | ||
binarized_mtx, thresholds = binarize( | ||
auc_mtx, threshold_overrides, args.seed, args.num_workers | ||
) | ||
|
||
# set column name for thresholds | ||
thresholds.rename("threshold", inplace=True) | ||
|
||
# Save output files | ||
binarized_mtx.to_csv(f"{args.output_prefix}/binarized_mtx.tsv", sep="\t") | ||
thresholds.to_csv(f"{args.output_prefix}/thresholds.tsv", sep="\t") |
81 changes: 81 additions & 0 deletions
81
tools/tertiary-analysis/pyscenic/pyscenic_binarize_aucell.xml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
<tool id="pyscenic_binarize" name="PySCENIC Binarize AUCell" profile="21.09" version="@TOOL_VERSION@+galaxy0"> | ||
<description>defines AUCell thresholds and tags cells as passing it or not</description> | ||
<macros> | ||
<import>macros.xml</import> | ||
</macros> | ||
<expand macro="requirements"/> | ||
<command detect_errors="exit_code"> | ||
<![CDATA[ | ||
ln -s $input_file aucell.tsv && | ||
python '$__tool_directory__/pyscenic_binarize_aucell.py' | ||
#if $threshold_overrides | ||
--threshold-overrides '$threshold_overrides' | ||
#end if | ||
#if $seed | ||
--seed '$seed' | ||
#end if | ||
--num-workers \${GALAXY_SLOTS:-1} | ||
--output-prefix ./ | ||
aucell.tsv | ||
]]> | ||
</command> | ||
<inputs> | ||
<param name="input_file" type="data" format="tabular,txt" label="Input AUC matrix"/> | ||
<param name="threshold_overrides" type="text" optional="true" label="Threshold overrides in JSON format" help="Override default threshold values for binarization"/> | ||
<param name="seed" type="integer" optional="true" label="Random seed"/> | ||
</inputs> | ||
<outputs> | ||
<data name="binarized_mtx" format="tsv" label="${tool.name} on ${on_string}: Binarized AUC matrix" from_work_dir="binarized_mtx.tsv"/> | ||
<data name="thresholds" format="tsv" label="${tool.name} on ${on_string}: Binarization thresholds" from_work_dir="thresholds.tsv"/> | ||
</outputs> | ||
<tests> | ||
<test> | ||
<param name="input_file" value="aucell_test_smaller.tsv"/> | ||
<param name="seed" value="10"/> | ||
<output name="binarized_mtx"> | ||
<assert_contents> | ||
<has_text_matching expression="D124DE\t0\t1"/> | ||
<has_n_lines n="5001"/> | ||
</assert_contents> | ||
</output> | ||
<output name="thresholds"> | ||
<assert_contents> | ||
<has_text text="AUCell_1"/> | ||
<has_n_lines n="3"/> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
</tests> | ||
<help> | ||
<![CDATA[ | ||
**What it does** | ||
This tool binarizes an AUC matrix using the `binarize` function from the `pySCENIC` package. | ||
It produces a classification of the cell per given AUCell into passing or not passing an | ||
automatically defined threshold, through binarization (when possible) of the AUCell distribution. | ||
See the SCENIC paper or the PySCENIC notebooks for more details. | ||
**Input** | ||
- Input AUC matrix (TSV): The AUC matrix to be binarized (first column expected to be cell identifiers). | ||
- Threshold overrides in JSON format (optional): Override default threshold values for binarization. | ||
- Random seed (optional): Seed for random number generation. | ||
**Output** | ||
- Binarized AUC matrix: The binarized AUC matrix. | ||
- Binarization thresholds: The threshold values used for binarization. | ||
**Example** | ||
Input AUC matrix (tsv file, as generated by the PySCENIC AUCell or Decoupler AUCell tools): | ||
``` | ||
AUCellA AUCellB AUCellC | ||
cell1 0.024 0.045 0.001 | ||
cell2 0.136 0.024 0.045 | ||
cellN 0.001 0.347 0.136 | ||
``` | ||
]]> | ||
</help> | ||
<expand macro="citations"/> | ||
</tool> |
Oops, something went wrong.