Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cell type/tumor annotation for ETP T-ALL (SCPCP000003) #826

Merged
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
4396373
init module skeleton
UTSouthwesternDSSR Oct 2, 2024
0da0ec5
update gitignore
UTSouthwesternDSSR Oct 2, 2024
fd023eb
added marker file
UTSouthwesternDSSR Oct 2, 2024
8bb093f
update readme
UTSouthwesternDSSR Oct 3, 2024
67cab4e
updated scripts for nonETP
UTSouthwesternDSSR Oct 3, 2024
5f03ef8
updated script for ETP
UTSouthwesternDSSR Oct 3, 2024
b507ab5
Merge remote-tracking branch 'origin/UTSouthwesternDSSR/jwl' into UTS…
UTSouthwesternDSSR Oct 3, 2024
da69280
updated for plotting
UTSouthwesternDSSR Oct 4, 2024
7e3d4db
add multipanel R script
UTSouthwesternDSSR Oct 4, 2024
66e0a23
updated scripts
UTSouthwesternDSSR Oct 7, 2024
e6d45fd
store the module score in seu object
UTSouthwesternDSSR Oct 8, 2024
17ed000
update renv.lock
UTSouthwesternDSSR Oct 8, 2024
38b7c50
Merge branch 'AlexsLemonade:main' into UTSouthwesternDSSR/jwl
UTSouthwesternDSSR Oct 16, 2024
7ef2c9d
edit script for re-running CopyKat on specific B cells (normal)
UTSouthwesternDSSR Oct 16, 2024
275f37e
Merge remote-tracking branch 'origin/UTSouthwesternDSSR/jwl' into UTS…
UTSouthwesternDSSR Oct 16, 2024
1687171
added plots
UTSouthwesternDSSR Oct 16, 2024
7b25f9b
Uncomment triggers for GHA workflows
jaclyn-taroni Oct 17, 2024
f6c8ce1
Update run GHA workflow to use environments, download correct data, test
jaclyn-taroni Oct 17, 2024
eee69ab
Flesh out Dockerfile
jaclyn-taroni Oct 17, 2024
813d1ea
Update image source in cell-type-nonETP-ALL-03 module
jaclyn-taroni Oct 17, 2024
172e89b
Merge branch 'main' into UTSouthwesternDSSR/jwl
jaclyn-taroni Oct 17, 2024
5938976
Add modules that are ready to be tested monthly
jaclyn-taroni Oct 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions .github/workflows/docker_cell-type-ETP-ALL-03.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# This is a workflow to build the docker image for the cell-type-ETP-ALL-03 module
#
# Docker modules are run on pull requests when code for files that affect the Docker image have changed.
# If other files are used during the Docker build, they should be added to `paths`
#
# At module initialization, this workflow is inactive, and needs to be activated manually

name: Build docker image for cell-type-ETP-ALL-03

concurrency:
# only one run per branch at a time
group: "docker_cell-type-ETP-ALL-03_${{ github.ref }}"
cancel-in-progress: true

on:
pull_request:
branches:
- main
paths:
- "analyses/cell-type-ETP-ALL-03/Dockerfile"
- "analyses/cell-type-ETP-ALL-03/.dockerignore"
- "analyses/cell-type-ETP-ALL-03/renv.lock"
- "analyses/cell-type-ETP-ALL-03/conda-lock.yml"
push:
branches:
- main
paths:
- "analyses/cell-type-ETP-ALL-03/Dockerfile"
- "analyses/cell-type-ETP-ALL-03/.dockerignore"
- "analyses/cell-type-ETP-ALL-03/renv.lock"
- "analyses/cell-type-ETP-ALL-03/conda-lock.yml"
workflow_dispatch:
inputs:
push-ecr:
description: "Push to AWS ECR"
type: boolean
required: true

jobs:
test-build:
name: Test Build Docker Image
if: github.event_name == 'pull_request' || (contains(github.event_name, 'workflow_') && !inputs.push-ecr)
runs-on: ubuntu-latest

steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build image
uses: docker/build-push-action@v5
with:
context: "{{defaultContext}}:analyses/cell-type-ETP-ALL-03"
push: false
cache-from: type=gha
cache-to: type=gha,mode=max

build-push:
name: Build and Push Docker Image
if: github.repository_owner == 'AlexsLemonade' && (github.event_name == 'push' || inputs.push-ecr)
uses: ./.github/workflows/build-push-docker-module.yml
with:
module: "cell-type-ETP-ALL-03"
push-ecr: true
92 changes: 92 additions & 0 deletions .github/workflows/run_cell-type-ETP-ALL-03.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# This is a workflow to run the cell-type-ETP-ALL-03 module
#
# Analysis modules are run based on three triggers:
# - Manual trigger
# - On pull requests where code in the module has changed
# - As a reusable workflow called from a separate workflow which periodically runs all modules
#
# At initialization, only the manual trigger is active

name: Run cell-type-ETP-ALL-03 analysis module
env:
MODULE_PATH: analyses/cell-type-ETP-ALL-03
AWS_DEFAULT_REGION: us-east-2

concurrency:
# only one run per branch at a time
group: "run_cell-type-ETP-ALL-03_${{ github.ref }}"
cancel-in-progress: true

on:
workflow_dispatch:
workflow_call:
pull_request:
branches:
- main
paths:
- analyses/cell-type-ETP-ALL-03/**
- "!analyses/cell-type-ETP-ALL-03/Dockerfile"
- "!analyses/cell-type-ETP-ALL-03/.dockerignore"
- .github/workflows/run_cell-type-ETP-ALL-03.yml

jobs:
run-module:
if: github.repository_owner == 'AlexsLemonade'
runs-on: ubuntu-latest
defaults:
run:
shell: bash -el {0}

steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Set up R
uses: r-lib/actions/setup-r@v2
with:
r-version: 4.4.0
use-public-rspm: true

- name: Set up pandoc
uses: r-lib/actions/setup-pandoc@v2

- name: Install system dependencies
run: |
sudo apt-get install -y libcurl4-openssl-dev \
libhdf5-dev \
libglpk40 \
libxml2-dev \
libfontconfig1-dev \
libharfbuzz-dev \
libfribidi-dev \
libtiff5-dev

- name: Set up renv
uses: r-lib/actions/setup-renv@v2
with:
working-directory: ${{ env.MODULE_PATH }}

- name: Set up conda
# Note that this creates and activates an environment named 'test' by default
uses: conda-incubator/setup-miniconda@v3
with:
miniforge-version: latest

- name: Install conda-lock and activate locked conda environment
run: |
conda install conda-lock
conda-lock install --name openscpca-cell-type-ETP-ALL-03 ${MODULE_PATH}/conda-lock.yml

# Update this step as needed to download the desired data
- name: Download test data
run: |
./download-data.py --projects SCPCP000003 --test-data --format SCE
./download-results.py --projects SCPCP000003 --test-data --modules doublet-detection

- name: Run analysis module
run: |
cd ${MODULE_PATH}
# run module script(s) here
Rscript scripts/00-01_processing_rds.R
Rscript scripts/02-03_annotation.R
Rscript scripts/multipanel_plot.R
4 changes: 4 additions & 0 deletions analyses/cell-type-ETP-ALL-03/.Rprofile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Don't activate renv in an OpenScPCA docker image
if (Sys.getenv('OPENSCPCA_DOCKER') != 'TRUE') {
source('renv/activate.R')
}
8 changes: 8 additions & 0 deletions analyses/cell-type-ETP-ALL-03/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Ignore everything by default
*

# Include specific files in the docker environment
!/renv.lock
!/requirements.txt
!/environment.yml
!/conda-lock.yml
7 changes: 7 additions & 0 deletions analyses/cell-type-ETP-ALL-03/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Results should not be committed
/results/*
!/results/README.md

# Ignore the scratch directory (but keep it present)
/scratch/*
!/scratch/.gitkeep
18 changes: 18 additions & 0 deletions analyses/cell-type-ETP-ALL-03/Azimuth_BM_level1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
tissueType,cellName,ensembl_id_positive_marker,ensembl_id_negative_marker,fullName,ontologyID
Immune system,B,"ENSG00000163534,ENSG00000132704,ENSG00000012124,ENSG00000138639,ENSG00000153064,ENSG00000156738,ENSG00000116191,ENSG00000104894,ENSG00000133789,ENSG00000105369",,B cell,CL_0000945
Immune system,CD4 T,"ENSG00000172005,ENSG00000138795,ENSG00000168685,ENSG00000081059,ENSG00000227507,ENSG00000104660,ENSG00000198851,ENSG00000167286,ENSG00000160654,ENSG00000139193",,CD4 T cell,CL_0000624
Immune system,CD8 T,"ENSG00000172116,ENSG00000153563,ENSG00000184613,ENSG00000167286,ENSG00000198851,ENSG00000160307,ENSG00000100450,ENSG00000160654,ENSG00000227191,ENSG00000271503",,CD8 T cell,CL_0000625
Immune system,DC,"ENSG00000198178,ENSG00000115718,ENSG00000070031,ENSG00000169432,ENSG00000105251,ENSG00000155367,ENSG00000168913,ENSG00000132514,ENSG00000239961,ENSG00000163687",,Dendritic cell,CL_0000451
Immune system,HSPC,"ENSG00000172995,ENSG00000186710,ENSG00000101200,ENSG00000163554,ENSG00000119888,ENSG00000188672,ENSG00000131016,ENSG00000112077,ENSG00000172247,ENSG00000170891",,Hematopoietic stem and progenitor cell,CL_0000037
Immune system,Mono,"ENSG00000197353,ENSG00000110203,ENSG00000166523,ENSG00000104974,ENSG00000158825,ENSG00000162444,ENSG00000186074,ENSG00000171051,ENSG00000125810,ENSG00000014914",,Monocytes,CL_0000576
Immune system,NK,"ENSG00000189430,ENSG00000198574,ENSG00000134545,ENSG00000150687,ENSG00000117281,ENSG00000156966,ENSG00000100385,ENSG00000115607,ENSG00000143184,ENSG00000150045",,Natural killer cell,CL_0000814
Immune system,Other T,"ENSG00000144290,ENSG00000111796,ENSG00000168685,ENSG00000215788,ENSG00000069667,ENSG00000107742,ENSG00000178573,ENSG00000113088,ENSG00000152518,ENSG00000145220",,Other T cell,CL_0000084
Immune system,Macrophage,"ENSG00000166211,ENSG00000121769,ENSG00000073754,ENSG00000275385,ENSG00000159189,ENSG00000173369,ENSG00000170323,ENSG00000173372,ENSG00000130203,ENSG00000250722",,Macrophage,CL_0000235
Immune system,Early Eryth,"ENSG00000119865,ENSG00000179348,ENSG00000005961,ENSG00000106327,ENSG00000102145,ENSG00000105610,ENSG00000170891,ENSG00000135525,ENSG00000075618,ENSG00000130208",,Early Erythrocyte,CL_0000764
Immune system,Late Eryth,"ENSG00000196188,ENSG00000112212,ENSG00000204010,ENSG00000188672,ENSG00000112077,ENSG00000163554,ENSG00000075340,ENSG00000119888,ENSG00000213934",,Late Erythrocyte,CL_0000764
Immune system,Plasma,"ENSG00000115884,ENSG00000222037,ENSG00000211640,ENSG00000048462,ENSG00000211673,ENSG00000240505,ENSG00000211685,ENSG00000167476,ENSG00000143297,ENSG00000243466",,Plasma cell,CL_0000786
Immune system,Platelet,"ENSG00000150681,ENSG00000187699,ENSG00000088726,ENSG00000169704,ENSG00000163737,ENSG00000163736,ENSG00000153071,ENSG00000113140,ENSG00000176783,ENSG00000124491",,Platelet,CL_0000233
Immune system,Stromal,"ENSG00000115461,ENSG00000047457,ENSG00000091513,ENSG00000011465,ENSG00000139329,ENSG00000164692,ENSG00000147571,ENSG00000041982,ENSG00000152583,ENSG00000112175",,Stromal cell,CL_0000499
Immune system,Blast,"ENSG00000002586,ENSG00000173762,ENSG00000124766,ENSG00000177606,ENSG00000117632,ENSG00000123416,ENSG00000167286",,Blast cell,CL_0000055
Immune system,Cancer,"ENSG00000026508,ENSG00000119888,ENSG00000141736,ENSG00000086205,ENSG00000111057,ENSG00000007062",,Cancer cell,CL_0001064
Immune system,Pre Eryth,"ENSG00000081237,ENSG00000170180,ENSG00000175792,ENSG00000072274,ENSG00000110195,ENSG00000135218,ENSG00000115232,ENSG00000244734,ENSG00000223609,ENSG00000133742",,Erythroid-like and erythroid precursor cell,CL_0000038
64 changes: 64 additions & 0 deletions analyses/cell-type-ETP-ALL-03/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
FROM bioconductor/r-ver:3.19

# Labels following the Open Containers Initiative (OCI) recommendations
# For more information, see https://specs.opencontainers.org/image-spec/annotations/?v=v1.0.1
LABEL org.opencontainers.image.title="openscpca/cell-type-ETP-ALL-03"
LABEL org.opencontainers.image.description="Docker image for the OpenScPCA analysis module 'cell-type-ETP-ALL-03'"
LABEL org.opencontainers.image.authors="OpenScPCA [email protected]"
LABEL org.opencontainers.image.source="https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/cell-type-ETP-ALL-03"

# Set an environment variable to allow checking if we are in an OpenScPCA container
ENV OPENSCPCA_DOCKER=TRUE

# set a name for the conda environment
ARG ENV_NAME=openscpca-cell-type-ETP-ALL-03

# set environment variables to install conda
ENV PATH="/opt/conda/bin:${PATH}"

# Install conda via miniforge
# adapted from https://github.com/conda-forge/miniforge-images/blob/master/ubuntu/Dockerfile
RUN curl -L "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" -o /tmp/miniforge.sh \
&& bash /tmp/miniforge.sh -b -p /opt/conda \
&& rm -f /tmp/miniforge.sh \
&& conda clean --tarballs --index-cache --packages --yes \
&& find /opt/conda -follow -type f -name '*.a' -delete \
&& find /opt/conda -follow -type f -name '*.pyc' -delete \
&& conda clean --force-pkgs-dirs --all --yes

# Activate conda environments in bash
RUN ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> /etc/skel/.bashrc \
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc

# Install conda-lock
RUN conda install --channel=conda-forge --name=base conda-lock \
&& conda clean --all --yes

# Install renv
RUN Rscript -e "install.packages('renv')"

# Disable the renv cache to install packages directly into the R library
ENV RENV_CONFIG_CACHE_ENABLED=FALSE

# Copy conda lock file to image
COPY conda-lock.yml conda-lock.yml

# restore from conda-lock.yml file and clean up to reduce image size
RUN conda-lock install -n ${ENV_NAME} conda-lock.yml \
&& conda clean --all --yes

# Copy the renv.lock file from the host environment to the image
COPY renv.lock renv.lock

# restore from renv.lock file and clean up to reduce image size
RUN Rscript -e 'renv::restore()' \
&& rm -rf ~/.cache/R/renv \
&& rm -rf /tmp/downloaded_packages \
&& rm -rf /tmp/Rtmp*

# Activate conda environment on bash launch
RUN echo "conda activate ${ENV_NAME}" >> ~/.bashrc

# Set CMD to bash to activate the environment when launching
CMD ["/bin/bash"]
85 changes: 85 additions & 0 deletions analyses/cell-type-ETP-ALL-03/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# ETP T-ALL Annotation (SCPCP000003)

This analysis module will include codes to annotate cell types and tumor/normal status in ETP T-ALL from SCPCP000003 (n=30) present on the ScPCA portal.

## Description

We first aim to annotate the cell types in ETP T-ALL, and use the annotated B cells in the sample as the "normal" cells to identify tumor cells, since T-ALL is caused by the clonal proliferation of immature T-cell [<https://www.nature.com/articles/s41375-018-0127-8>].

- We use the cell type marker (`Azimuth_BM_level1.csv`) from [Azimuth Human Bone Marrow reference](https://azimuth.hubmapconsortium.org/references/#Human%20-%20Bone%20Marrow). In total, there are 14 cell types: B, CD4T, CD8T, Other T, DC, Monocytes, Macrophages, NK, Early Erythrocytes, Late Erythrocytes, Plasma, Platelet, Stromal, and Hematopoietic Stem and Progenitor Cells (HSPC). Based on the exploratory analysis, we believe that most of the cells in these samples do not express adequate markers to be distinguished at finer cell type level (eg. naive vs memory, CD14 vs CD16 etc.), and majority of the cells should belong to T-cells. In addition, we include the marker genes for blast cell [[Bhasin et al. (2023)](https://www.nature.com/articles/s41598-023-39152-z)] as well as erythroid precursor and cancer cell in immune system [[ScType](https://sctype.app/database.php) database].

- Since ScType annotates cell types at cluster level using marker genes provided by user or from the built-in database, we employ [self-assembling manifold](https://github.com/atarashansky/self-assembling-manifold/tree/master) (SAM) algorithm, a soft feature selection strategy for better separation of homogeneous cell types.

- After cell type annotation, we provide B cells as the normal cells in the sample, if there is any, to [CopyKat](https://github.com/navinlabcode/copykat), for identification of tumor cells.

Here are the steps in the module:

1. Generating a processed rds file for each sample using SAM (`scripts/00-01_processing_rds.R`)

2. Annotating cell type using ScType and identifying tumor cells using CopyKat (`scripts/02-03_annotation.R`)

## Usage

Before running Rscripts in R or Rstudio, we first need to prepare the input files as shown in the next section, and run the following codes in the terminal for installing required libraries:

```
#system packages installation
sudo apt install libglpk40
sudo apt install libcurl4-openssl-dev #for Seurat
sudo apt-get install libxml2-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev libtiff5-dev #for devtools

conda-lock install --name openscpca-cell-type-ETP-ALL-03 conda-lock.yml
Rscript -e "renv::restore()"
```

## Input files

The `scripts/00-01_processing_rds.R` requires the processed SingleCellExperiment objects (`_processed.rds`) and doublet-detection results (`_processed_scdblfinder.tsv`) from SCPCP000003. These files could be obtained from running the following codes:

```
#run in terminal
../../download-data.py --projects SCPCP000003
../../download-results.py --projects SCPCP000003 --modules doublet-detection
```

As for the annotation, `scripts/02-03_annotation.R` requires cell type marker gene file, `Azimuth_BM_level1.csv`, as an input for ScType. This excel file contains a list of positive marker genes in Ensembl ID under `ensembl_id_positive_marker` for each cell type; *TMEM56* and *CD235a* are not detected in our dataset, thus they are being removed as part of the markers for Late Eryth and Pre Eryth respectively. As of now, there is no negative marker genes provided under `ensembl_id_negative_marker`.

## Output files

Running `scripts/00-01_processing_rds.R` will generate two types of output:

- `rds` objects in `scratch/`

- umap plots showing leiden clustering in `plots/`

Running `scripts/02-03_annotation.R` will generate several outputs:

- updated `rds` objects in `scratch/`

- umap plots showing cell type and CopyKat prediction (if there is any) and dotplots showing the features added with `AddModuleScore()` in `plots/`

- ScType results of top 10 possible cell types in a cluster (`_sctype_top10_celltypes_perCluster.txt`) and metadata file tabulating leiden cluster, cell type, low confidence cell type, and CopyKat prediction for each cell (`_metadata.txt`) in `results/`

## Software requirements

To run the analysis, execute the Rscript in R or Rstudio (version 4.4.0). The main libraries used are:

- Seurat (version 5.1.0)

- reticulate (version 1.39.0)

- sam-algorithm (in python)

- ScType

- CopyKat

The renv.lock file contains all packages and version information. All python libraries are installed in the conda environment `openscpca-cell-type-ETP-ALL-03`, and the python codes are executed in the same environment by running them in R via `reticulate`. To create and activate this environment from `.yml` file use:

```
conda-lock install --name openscpca-cell-type-ETP-ALL-03 conda-lock.yml
```

## Computational resources

All the commands above are currently executed in the standard 4XL virtual machine via AWS Lightsail for Research, but it runs pretty slow for CopyKat with one computational core.
13 changes: 13 additions & 0 deletions analyses/cell-type-ETP-ALL-03/cell-type-ETP-ALL-03.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX
2 changes: 2 additions & 0 deletions analyses/cell-type-ETP-ALL-03/components/dependencies.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# R dependencies not captured by `renv`
# library("missing_package")
Loading
Loading