Skip to content

Commit

Permalink
Merge pull request #826 from UTSouthwesternDSSR/UTSouthwesternDSSR/jwl
Browse files Browse the repository at this point in the history
cell type/tumor annotation for ETP T-ALL (SCPCP000003)
  • Loading branch information
jaclyn-taroni authored Oct 17, 2024
2 parents a5c3623 + 5938976 commit 3362d31
Show file tree
Hide file tree
Showing 89 changed files with 11,621 additions and 2 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/docker_all-modules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ jobs:
- cell-type-ewings
- doublet-detection
- cell-type-wilms-tumor-06
- cell-type-wilms-tumor-14
- cell-type-nonETP-ALL-03
- cell-type-ETP-ALL-03
uses: ./.github/workflows/build-push-docker-module.yml
if: github.repository_owner == 'AlexsLemonade'
with:
Expand Down
63 changes: 63 additions & 0 deletions .github/workflows/docker_cell-type-ETP-ALL-03.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# This is a workflow to build the docker image for the cell-type-ETP-ALL-03 module
#
# Docker modules are run on pull requests when code for files that affect the Docker image have changed.
# If other files are used during the Docker build, they should be added to `paths`
#
# At module initialization, this workflow is inactive, and needs to be activated manually

name: Build docker image for cell-type-ETP-ALL-03

concurrency:
# only one run per branch at a time
group: "docker_cell-type-ETP-ALL-03_${{ github.ref }}"
cancel-in-progress: true

on:
pull_request:
branches:
- main
paths:
- "analyses/cell-type-ETP-ALL-03/Dockerfile"
- "analyses/cell-type-ETP-ALL-03/.dockerignore"
- "analyses/cell-type-ETP-ALL-03/renv.lock"
- "analyses/cell-type-ETP-ALL-03/conda-lock.yml"
push:
branches:
- main
paths:
- "analyses/cell-type-ETP-ALL-03/Dockerfile"
- "analyses/cell-type-ETP-ALL-03/.dockerignore"
- "analyses/cell-type-ETP-ALL-03/renv.lock"
- "analyses/cell-type-ETP-ALL-03/conda-lock.yml"
workflow_dispatch:
inputs:
push-ecr:
description: "Push to AWS ECR"
type: boolean
required: true

jobs:
test-build:
name: Test Build Docker Image
if: github.event_name == 'pull_request' || (contains(github.event_name, 'workflow_') && !inputs.push-ecr)
runs-on: ubuntu-latest

steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build image
uses: docker/build-push-action@v5
with:
context: "{{defaultContext}}:analyses/cell-type-ETP-ALL-03"
push: false
cache-from: type=gha
cache-to: type=gha,mode=max

build-push:
name: Build and Push Docker Image
if: github.repository_owner == 'AlexsLemonade' && (github.event_name == 'push' || inputs.push-ecr)
uses: ./.github/workflows/build-push-docker-module.yml
with:
module: "cell-type-ETP-ALL-03"
push-ecr: true
12 changes: 11 additions & 1 deletion .github/workflows/run_all-modules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,24 @@ jobs:
cell-type-wilms-14:
uses: ./.github/workflows/run_cell-type-wilms-tumor-14.yml

## Add additional modules above this comment, and to the needs list below
cell-type-ETP-ALL-03:
uses: ./.github/workflows/run_cell-type-ETP-ALL-03.yml

cell-type-nonETP-ALL-03:
uses: ./.github/workflows/run_cell-type-nonETP-ALL-03.yml

## Add additional modules above this comment, and to the needs list below
check-jobs:
if: ${{ always() }}
needs:
- hello-R
- hello-python
- doublet-detection
- cell-type-ewings
- cell-type-wilms-06
- cell-type-wilms-14
- cell-type-ETP-ALL-03
- cell-type-nonETP-ALL-03
runs-on: ubuntu-latest
steps:
- name: Checkout template file
Expand Down
92 changes: 92 additions & 0 deletions .github/workflows/run_cell-type-ETP-ALL-03.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# This is a workflow to run the cell-type-ETP-ALL-03 module
#
# Analysis modules are run based on three triggers:
# - Manual trigger
# - On pull requests where code in the module has changed
# - As a reusable workflow called from a separate workflow which periodically runs all modules
#
# At initialization, only the manual trigger is active

name: Run cell-type-ETP-ALL-03 analysis module
env:
MODULE_PATH: analyses/cell-type-ETP-ALL-03
AWS_DEFAULT_REGION: us-east-2

concurrency:
# only one run per branch at a time
group: "run_cell-type-ETP-ALL-03_${{ github.ref }}"
cancel-in-progress: true

on:
workflow_dispatch:
workflow_call:
pull_request:
branches:
- main
paths:
- analyses/cell-type-ETP-ALL-03/**
- "!analyses/cell-type-ETP-ALL-03/Dockerfile"
- "!analyses/cell-type-ETP-ALL-03/.dockerignore"
- .github/workflows/run_cell-type-ETP-ALL-03.yml

jobs:
run-module:
if: github.repository_owner == 'AlexsLemonade'
runs-on: ubuntu-latest
defaults:
run:
shell: bash -el {0}

steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Set up R
uses: r-lib/actions/setup-r@v2
with:
r-version: 4.4.0
use-public-rspm: true

- name: Set up pandoc
uses: r-lib/actions/setup-pandoc@v2

- name: Install system dependencies
run: |
sudo apt-get install -y libcurl4-openssl-dev \
libhdf5-dev \
libglpk40 \
libxml2-dev \
libfontconfig1-dev \
libharfbuzz-dev \
libfribidi-dev \
libtiff5-dev
- name: Set up renv
uses: r-lib/actions/setup-renv@v2
with:
working-directory: ${{ env.MODULE_PATH }}

- name: Set up conda
# Note that this creates and activates an environment named 'test' by default
uses: conda-incubator/setup-miniconda@v3
with:
miniforge-version: latest

- name: Install conda-lock and activate locked conda environment
run: |
conda install conda-lock
conda-lock install --name openscpca-cell-type-ETP-ALL-03 ${MODULE_PATH}/conda-lock.yml
# Update this step as needed to download the desired data
- name: Download test data
run: |
./download-data.py --projects SCPCP000003 --test-data --format SCE
./download-results.py --projects SCPCP000003 --test-data --modules doublet-detection
- name: Run analysis module
run: |
cd ${MODULE_PATH}
# run module script(s) here
Rscript scripts/00-01_processing_rds.R
Rscript scripts/02-03_annotation.R
Rscript scripts/multipanel_plot.R
4 changes: 4 additions & 0 deletions analyses/cell-type-ETP-ALL-03/.Rprofile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Don't activate renv in an OpenScPCA docker image
if (Sys.getenv('OPENSCPCA_DOCKER') != 'TRUE') {
source('renv/activate.R')
}
8 changes: 8 additions & 0 deletions analyses/cell-type-ETP-ALL-03/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Ignore everything by default
*

# Include specific files in the docker environment
!/renv.lock
!/requirements.txt
!/environment.yml
!/conda-lock.yml
7 changes: 7 additions & 0 deletions analyses/cell-type-ETP-ALL-03/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Results should not be committed
/results/*
!/results/README.md

# Ignore the scratch directory (but keep it present)
/scratch/*
!/scratch/.gitkeep
18 changes: 18 additions & 0 deletions analyses/cell-type-ETP-ALL-03/Azimuth_BM_level1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
tissueType,cellName,ensembl_id_positive_marker,ensembl_id_negative_marker,fullName,ontologyID
Immune system,B,"ENSG00000163534,ENSG00000132704,ENSG00000012124,ENSG00000138639,ENSG00000153064,ENSG00000156738,ENSG00000116191,ENSG00000104894,ENSG00000133789,ENSG00000105369",,B cell,CL_0000945
Immune system,CD4 T,"ENSG00000172005,ENSG00000138795,ENSG00000168685,ENSG00000081059,ENSG00000227507,ENSG00000104660,ENSG00000198851,ENSG00000167286,ENSG00000160654,ENSG00000139193",,CD4 T cell,CL_0000624
Immune system,CD8 T,"ENSG00000172116,ENSG00000153563,ENSG00000184613,ENSG00000167286,ENSG00000198851,ENSG00000160307,ENSG00000100450,ENSG00000160654,ENSG00000227191,ENSG00000271503",,CD8 T cell,CL_0000625
Immune system,DC,"ENSG00000198178,ENSG00000115718,ENSG00000070031,ENSG00000169432,ENSG00000105251,ENSG00000155367,ENSG00000168913,ENSG00000132514,ENSG00000239961,ENSG00000163687",,Dendritic cell,CL_0000451
Immune system,HSPC,"ENSG00000172995,ENSG00000186710,ENSG00000101200,ENSG00000163554,ENSG00000119888,ENSG00000188672,ENSG00000131016,ENSG00000112077,ENSG00000172247,ENSG00000170891",,Hematopoietic stem and progenitor cell,CL_0000037
Immune system,Mono,"ENSG00000197353,ENSG00000110203,ENSG00000166523,ENSG00000104974,ENSG00000158825,ENSG00000162444,ENSG00000186074,ENSG00000171051,ENSG00000125810,ENSG00000014914",,Monocytes,CL_0000576
Immune system,NK,"ENSG00000189430,ENSG00000198574,ENSG00000134545,ENSG00000150687,ENSG00000117281,ENSG00000156966,ENSG00000100385,ENSG00000115607,ENSG00000143184,ENSG00000150045",,Natural killer cell,CL_0000814
Immune system,Other T,"ENSG00000144290,ENSG00000111796,ENSG00000168685,ENSG00000215788,ENSG00000069667,ENSG00000107742,ENSG00000178573,ENSG00000113088,ENSG00000152518,ENSG00000145220",,Other T cell,CL_0000084
Immune system,Macrophage,"ENSG00000166211,ENSG00000121769,ENSG00000073754,ENSG00000275385,ENSG00000159189,ENSG00000173369,ENSG00000170323,ENSG00000173372,ENSG00000130203,ENSG00000250722",,Macrophage,CL_0000235
Immune system,Early Eryth,"ENSG00000119865,ENSG00000179348,ENSG00000005961,ENSG00000106327,ENSG00000102145,ENSG00000105610,ENSG00000170891,ENSG00000135525,ENSG00000075618,ENSG00000130208",,Early Erythrocyte,CL_0000764
Immune system,Late Eryth,"ENSG00000196188,ENSG00000112212,ENSG00000204010,ENSG00000188672,ENSG00000112077,ENSG00000163554,ENSG00000075340,ENSG00000119888,ENSG00000213934",,Late Erythrocyte,CL_0000764
Immune system,Plasma,"ENSG00000115884,ENSG00000222037,ENSG00000211640,ENSG00000048462,ENSG00000211673,ENSG00000240505,ENSG00000211685,ENSG00000167476,ENSG00000143297,ENSG00000243466",,Plasma cell,CL_0000786
Immune system,Platelet,"ENSG00000150681,ENSG00000187699,ENSG00000088726,ENSG00000169704,ENSG00000163737,ENSG00000163736,ENSG00000153071,ENSG00000113140,ENSG00000176783,ENSG00000124491",,Platelet,CL_0000233
Immune system,Stromal,"ENSG00000115461,ENSG00000047457,ENSG00000091513,ENSG00000011465,ENSG00000139329,ENSG00000164692,ENSG00000147571,ENSG00000041982,ENSG00000152583,ENSG00000112175",,Stromal cell,CL_0000499
Immune system,Blast,"ENSG00000002586,ENSG00000173762,ENSG00000124766,ENSG00000177606,ENSG00000117632,ENSG00000123416,ENSG00000167286",,Blast cell,CL_0000055
Immune system,Cancer,"ENSG00000026508,ENSG00000119888,ENSG00000141736,ENSG00000086205,ENSG00000111057,ENSG00000007062",,Cancer cell,CL_0001064
Immune system,Pre Eryth,"ENSG00000081237,ENSG00000170180,ENSG00000175792,ENSG00000072274,ENSG00000110195,ENSG00000135218,ENSG00000115232,ENSG00000244734,ENSG00000223609,ENSG00000133742",,Erythroid-like and erythroid precursor cell,CL_0000038
64 changes: 64 additions & 0 deletions analyses/cell-type-ETP-ALL-03/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
FROM bioconductor/r-ver:3.19

# Labels following the Open Containers Initiative (OCI) recommendations
# For more information, see https://specs.opencontainers.org/image-spec/annotations/?v=v1.0.1
LABEL org.opencontainers.image.title="openscpca/cell-type-ETP-ALL-03"
LABEL org.opencontainers.image.description="Docker image for the OpenScPCA analysis module 'cell-type-ETP-ALL-03'"
LABEL org.opencontainers.image.authors="OpenScPCA [email protected]"
LABEL org.opencontainers.image.source="https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/cell-type-ETP-ALL-03"

# Set an environment variable to allow checking if we are in an OpenScPCA container
ENV OPENSCPCA_DOCKER=TRUE

# set a name for the conda environment
ARG ENV_NAME=openscpca-cell-type-ETP-ALL-03

# set environment variables to install conda
ENV PATH="/opt/conda/bin:${PATH}"

# Install conda via miniforge
# adapted from https://github.com/conda-forge/miniforge-images/blob/master/ubuntu/Dockerfile
RUN curl -L "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" -o /tmp/miniforge.sh \
&& bash /tmp/miniforge.sh -b -p /opt/conda \
&& rm -f /tmp/miniforge.sh \
&& conda clean --tarballs --index-cache --packages --yes \
&& find /opt/conda -follow -type f -name '*.a' -delete \
&& find /opt/conda -follow -type f -name '*.pyc' -delete \
&& conda clean --force-pkgs-dirs --all --yes

# Activate conda environments in bash
RUN ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> /etc/skel/.bashrc \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc

# Install conda-lock
RUN conda install --channel=conda-forge --name=base conda-lock \
&& conda clean --all --yes

# Install renv
RUN Rscript -e "install.packages('renv')"

# Disable the renv cache to install packages directly into the R library
ENV RENV_CONFIG_CACHE_ENABLED=FALSE

# Copy conda lock file to image
COPY conda-lock.yml conda-lock.yml

# restore from conda-lock.yml file and clean up to reduce image size
RUN conda-lock install -n ${ENV_NAME} conda-lock.yml \
&& conda clean --all --yes

# Copy the renv.lock file from the host environment to the image
COPY renv.lock renv.lock

# restore from renv.lock file and clean up to reduce image size
RUN Rscript -e 'renv::restore()' \
&& rm -rf ~/.cache/R/renv \
&& rm -rf /tmp/downloaded_packages \
&& rm -rf /tmp/Rtmp*

# Activate conda environment on bash launch
RUN echo "conda activate ${ENV_NAME}" >> ~/.bashrc

# Set CMD to bash to activate the environment when launching
CMD ["/bin/bash"]
85 changes: 85 additions & 0 deletions analyses/cell-type-ETP-ALL-03/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# ETP T-ALL Annotation (SCPCP000003)

This analysis module will include codes to annotate cell types and tumor/normal status in ETP T-ALL from SCPCP000003 (n=30) present on the ScPCA portal.

## Description

We first aim to annotate the cell types in ETP T-ALL, and use the annotated B cells in the sample as the "normal" cells to identify tumor cells, since T-ALL is caused by the clonal proliferation of immature T-cell [<https://www.nature.com/articles/s41375-018-0127-8>].

- We use the cell type marker (`Azimuth_BM_level1.csv`) from [Azimuth Human Bone Marrow reference](https://azimuth.hubmapconsortium.org/references/#Human%20-%20Bone%20Marrow). In total, there are 14 cell types: B, CD4T, CD8T, Other T, DC, Monocytes, Macrophages, NK, Early Erythrocytes, Late Erythrocytes, Plasma, Platelet, Stromal, and Hematopoietic Stem and Progenitor Cells (HSPC). Based on the exploratory analysis, we believe that most of the cells in these samples do not express adequate markers to be distinguished at finer cell type level (eg. naive vs memory, CD14 vs CD16 etc.), and majority of the cells should belong to T-cells. In addition, we include the marker genes for blast cell [[Bhasin et al. (2023)](https://www.nature.com/articles/s41598-023-39152-z)] as well as erythroid precursor and cancer cell in immune system [[ScType](https://sctype.app/database.php) database].

- Since ScType annotates cell types at cluster level using marker genes provided by user or from the built-in database, we employ [self-assembling manifold](https://github.com/atarashansky/self-assembling-manifold/tree/master) (SAM) algorithm, a soft feature selection strategy for better separation of homogeneous cell types.

- After cell type annotation, we provide B cells as the normal cells in the sample, if there is any, to [CopyKat](https://github.com/navinlabcode/copykat), for identification of tumor cells.

Here are the steps in the module:

1. Generating a processed rds file for each sample using SAM (`scripts/00-01_processing_rds.R`)

2. Annotating cell type using ScType and identifying tumor cells using CopyKat (`scripts/02-03_annotation.R`)

## Usage

Before running Rscripts in R or Rstudio, we first need to prepare the input files as shown in the next section, and run the following codes in the terminal for installing required libraries:

```
#system packages installation
sudo apt install libglpk40
sudo apt install libcurl4-openssl-dev #for Seurat
sudo apt-get install libxml2-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev libtiff5-dev #for devtools
conda-lock install --name openscpca-cell-type-ETP-ALL-03 conda-lock.yml
Rscript -e "renv::restore()"
```

## Input files

The `scripts/00-01_processing_rds.R` requires the processed SingleCellExperiment objects (`_processed.rds`) and doublet-detection results (`_processed_scdblfinder.tsv`) from SCPCP000003. These files could be obtained from running the following codes:

```
#run in terminal
../../download-data.py --projects SCPCP000003
../../download-results.py --projects SCPCP000003 --modules doublet-detection
```

As for the annotation, `scripts/02-03_annotation.R` requires cell type marker gene file, `Azimuth_BM_level1.csv`, as an input for ScType. This excel file contains a list of positive marker genes in Ensembl ID under `ensembl_id_positive_marker` for each cell type; *TMEM56* and *CD235a* are not detected in our dataset, thus they are being removed as part of the markers for Late Eryth and Pre Eryth respectively. As of now, there is no negative marker genes provided under `ensembl_id_negative_marker`.

## Output files

Running `scripts/00-01_processing_rds.R` will generate two types of output:

- `rds` objects in `scratch/`

- umap plots showing leiden clustering in `plots/`

Running `scripts/02-03_annotation.R` will generate several outputs:

- updated `rds` objects in `scratch/`

- umap plots showing cell type and CopyKat prediction (if there is any) and dotplots showing the features added with `AddModuleScore()` in `plots/`

- ScType results of top 10 possible cell types in a cluster (`_sctype_top10_celltypes_perCluster.txt`) and metadata file tabulating leiden cluster, cell type, low confidence cell type, and CopyKat prediction for each cell (`_metadata.txt`) in `results/`

## Software requirements

To run the analysis, execute the Rscript in R or Rstudio (version 4.4.0). The main libraries used are:

- Seurat (version 5.1.0)

- reticulate (version 1.39.0)

- sam-algorithm (in python)

- ScType

- CopyKat

The renv.lock file contains all packages and version information. All python libraries are installed in the conda environment `openscpca-cell-type-ETP-ALL-03`, and the python codes are executed in the same environment by running them in R via `reticulate`. To create and activate this environment from `.yml` file use:

```
conda-lock install --name openscpca-cell-type-ETP-ALL-03 conda-lock.yml
```

## Computational resources

All the commands above are currently executed in the standard 4XL virtual machine via AWS Lightsail for Research, but it runs pretty slow for CopyKat with one computational core.
13 changes: 13 additions & 0 deletions analyses/cell-type-ETP-ALL-03/cell-type-ETP-ALL-03.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX
Loading

0 comments on commit 3362d31

Please sign in to comment.