Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploratory results for inferCNV on non-ETP samples (SCPCP000003) #838

Merged
merged 21 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
412d8fb
adding inferCNV part
UTSouthwesternDSSR Oct 23, 2024
b802413
Merge remote-tracking branch 'origin/main' into UTSouthwesternDSSR/no…
UTSouthwesternDSSR Oct 23, 2024
c3c6b8c
Merge branch 'AlexsLemonade:main' into main
UTSouthwesternDSSR Oct 23, 2024
9a33d0b
Add jags to system dependencies installation
jaclyn-taroni Oct 25, 2024
7f532da
Add Rhtslib installation step separately to Dockerfile
jaclyn-taroni Oct 25, 2024
a059dde
update scripts structure
UTSouthwesternDSSR Oct 29, 2024
0f075ed
Merge remote-tracking branch 'origin/main' into UTSouthwesternDSSR/no…
UTSouthwesternDSSR Oct 29, 2024
e31ab64
added marker genes table in final submission format
UTSouthwesternDSSR Oct 29, 2024
692be0d
change directory structure
UTSouthwesternDSSR Oct 30, 2024
33b2be0
change name
UTSouthwesternDSSR Oct 30, 2024
3588896
add script for rerun copykat
UTSouthwesternDSSR Oct 30, 2024
32423db
final submission script
UTSouthwesternDSSR Oct 30, 2024
151559d
Add new scripts to CI/CD
jaclyn-taroni Oct 30, 2024
514a06d
update final submission
UTSouthwesternDSSR Oct 30, 2024
d56c5d6
Merge remote-tracking branch 'origin/main' into UTSouthwesternDSSR/no…
UTSouthwesternDSSR Oct 30, 2024
b3478c7
update submission script and output
UTSouthwesternDSSR Oct 31, 2024
8d9dd4c
Merge branch 'main' into main
jashapiro Oct 31, 2024
7bde7e5
update scripts and readme
UTSouthwesternDSSR Oct 31, 2024
9b36d48
Merge remote-tracking branch 'origin/main' into UTSouthwesternDSSR/no…
UTSouthwesternDSSR Oct 31, 2024
74d9cad
exploration plots for CopyKat prediction with fine-tuned B cells
UTSouthwesternDSSR Oct 31, 2024
c3a7a7e
Merge branch 'main' into main
jaclyn-taroni Nov 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .github/workflows/run_cell-type-nonETP-ALL-03.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ jobs:
libfontconfig1-dev \
libharfbuzz-dev \
libfribidi-dev \
libtiff5-dev
libtiff5-dev \
jags

- name: Set up renv
uses: r-lib/actions/setup-renv@v2
Expand Down Expand Up @@ -91,3 +92,7 @@ jobs:
Rscript scripts/02-03_annotation.R
Rscript scripts/04_multipanel_plot.R
Rscript scripts/05_cluster_evaluation.R
Rscript scripts/06_sctype_exploration.R
Rscript scripts/07_run_copykat.R
Rscript scripts/markerGenes_submission.R
Rscript scripts/writeout_submission.R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @UTSouthwesternDSSR,

I was looking at your PR trying to figure out the previous CI failure. It may be due to the small number of cells in the test data, but it may also be a stochastic failure, as I was able to run the whole workflow on a separate machine with the same test data. Nonetheless, it might be helpful for future debugging to add a few info messages like the ones below to help figure out where we are in the CI process.

Since you just added a small change, I will wait to see if that passes before proceeding too far!

Suggested change
Rscript scripts/02-03_annotation.R
Rscript scripts/04_multipanel_plot.R
Rscript scripts/05_cluster_evaluation.R
Rscript scripts/06_sctype_exploration.R
Rscript scripts/07_run_copykat.R
Rscript scripts/markerGenes_submission.R
Rscript scripts/writeout_submission.R
printf "\n\nRunning 02-03_annotation.R\n"
Rscript scripts/02-03_annotation.R
printf "\n\nRunning 04_multipanel_plot.R\n"
Rscript scripts/04_multipanel_plot.R
printf "\n\nRunning 05_cluster_evaluation.R\n"
Rscript scripts/05_cluster_evaluation.R
printf "\n\nRunning 06_sctype_exploration.R\n"
Rscript scripts/06_sctype_exploration.R
printf "\n\nRunning 07_run_copykat.R\n"
Rscript scripts/07_run_copykat.R
printf "\n\nRunning markerGenes_submission.R\n"
Rscript scripts/markerGenes_submission.R
printf "\n\nRunning writeout_submission.R\n"
Rscript scripts/writeout_submission.R

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is working now, so I just merged in the main branch. Whether you want to include the changes suggested above is up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thank you! I think it is good to add some comments too. I am still doing some minor change on the script and output.

3 changes: 3 additions & 0 deletions analyses/cell-type-nonETP-ALL-03/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ RUN conda-lock install -n ${ENV_NAME} conda-lock.yml \
# Copy the renv.lock file from the host environment to the image
COPY renv.lock renv.lock

# Temporarily install Rhtslib separately
RUN Rscript -e 'BiocManager::install("Rhtslib")'

# restore from renv.lock file and clean up to reduce image size
RUN Rscript -e 'renv::restore()' \
&& rm -rf ~/.cache/R/renv \
Expand Down
24 changes: 12 additions & 12 deletions analyses/cell-type-nonETP-ALL-03/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,22 @@ We first aim to annotate the cell types in non-ETP T-ALL, and use the annotated

- We use the cell type marker (`Azimuth_BM_level1.csv`) from [Azimuth Human Bone Marrow reference](https://azimuth.hubmapconsortium.org/references/#Human%20-%20Bone%20Marrow). In total, there are 14 cell types: B, CD4T, CD8T, Other T, DC, Monocytes, Macrophages, NK, Early Erythrocytes, Late Erythrocytes, Plasma, Platelet, Stromal, and Hematopoietic Stem and Progenitor Cells (HSPC). Based on the exploratory analysis, we believe that most of the cells in these samples do not express adequate markers to be distinguished at finer cell type level (eg. naive vs memory, CD14 vs CD16 etc.), and majority of the cells should belong to T-cells. In addition, we include the marker genes for blast cell [[Bhasin et al. (2023)](https://www.nature.com/articles/s41598-023-39152-z)] as well as erythroid precursor and cancer cell in immune system [[ScType](https://sctype.app/database.php) database].

\*\*`Azimuth_BM_level1.csv` is converted to `submission_markerGenes.tsv`, in the final submission format.

- Since ScType annotates cell types at cluster level using marker genes provided by user or from the built-in database, we employ [self-assembling manifold](https://github.com/atarashansky/self-assembling-manifold/tree/master) (SAM) algorithm, a soft feature selection strategy for better separation of homogeneous cell types.

- After cell type annotation, we provide B cells as the normal cells in the sample, if there is any, to [CopyKat](https://github.com/navinlabcode/copykat), for identification of tumor cells.
- After cell type annotation, we fine-tune the annotated B cells by applying 99 percentile cutoff of non-B ScType score on the "B cell clusters". We then use the new B cells (i.e those cells which passed the cutoff) as the normal cells in running [CopyKat](https://github.com/navinlabcode/copykat), for the identification of tumor cells. We could not detect strong B cell signal in `SCPCL000082`.

Here are the steps in the module:

1. Generating a processed rds file for each sample using SAM (`scripts/00-01_processing_rds.R`)

2. Annotating cell type using ScType and identifying tumor cells using CopyKat (`scripts/02-03_annotation.R`)

3. Fine-tuning the B cells (`scripts/06_sctype_exploration.R`)

4. Re-running CopyKat (`scripts/07_run_copykat.R`)

## Usage

Before running Rscripts in R or Rstudio, we first need to prepare the input files as shown in the next section, and run the following codes in the terminal for installing required libraries:
Expand All @@ -44,21 +50,15 @@ The `scripts/00-01_processing_rds.R` requires the processed SingleCellExperiment

As for the annotation, `scripts/02-03_annotation.R` requires cell type marker gene file, `Azimuth_BM_level1.csv`, as an input for ScType. This excel file contains a list of positive marker genes in Ensembl ID under `ensembl_id_positive_marker` for each cell type; *TMEM56* and *CD235a* are not detected in our dataset, thus they are being removed as part of the markers for Late Eryth and Pre Eryth respectively. As of now, there is no negative marker genes provided under `ensembl_id_negative_marker`.

## Output files

Running `scripts/00-01_processing_rds.R` will generate two types of output:

- `rds` objects in `scratch/`

- umap plots showing leiden clustering in `plots/`
## Important output files

Running `scripts/02-03_annotation.R` will generate several outputs:
- `rds` objects in `results/rds`

- updated `rds` objects in `scratch/`
- ScType results of top 10 possible cell types in a cluster (`results/_sctype_top10_celltypes_perCluster.txt`) and ScType score (`results/_sctype_scores.txt`)

- umap plots showing cell type and CopyKat prediction (if there is any) and dotplots showing the features added with `AddModuleScore()` in `plots/`
- location of fine-tuned B cells in umap (`plots/sctype_exploration/_newBcells.png`) and the cell type assignment with added fine-tuned B cells (`results/_newB-normal-annotation.txt`)

- ScType results of top 10 possible cell types in a cluster (`_sctype_top10_celltypes_perCluster.txt`) and metadata file tabulating leiden cluster, cell type, low confidence cell type, and CopyKat prediction for each cell (`_metadata.txt`) in `results/`
- final submission table (`results/submission_table/_metadata.tsv`) and the umap plots showing cell_type_assignment from ScType and tumor_cell_classification from CopyKat using fine-tuned B cells (`results/submission_table/multipanels_.png`)

## Software requirements

Expand Down
Loading
Loading