-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #826 from UTSouthwesternDSSR/UTSouthwesternDSSR/jwl
cell type/tumor annotation for ETP T-ALL (SCPCP000003)
- Loading branch information
Showing
89 changed files
with
11,621 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# This is a workflow to build the docker image for the cell-type-ETP-ALL-03 module | ||
# | ||
# Docker modules are run on pull requests when code for files that affect the Docker image have changed. | ||
# If other files are used during the Docker build, they should be added to `paths` | ||
# | ||
# At module initialization, this workflow is inactive, and needs to be activated manually | ||
|
||
name: Build docker image for cell-type-ETP-ALL-03 | ||
|
||
concurrency: | ||
# only one run per branch at a time | ||
group: "docker_cell-type-ETP-ALL-03_${{ github.ref }}" | ||
cancel-in-progress: true | ||
|
||
on: | ||
pull_request: | ||
branches: | ||
- main | ||
paths: | ||
- "analyses/cell-type-ETP-ALL-03/Dockerfile" | ||
- "analyses/cell-type-ETP-ALL-03/.dockerignore" | ||
- "analyses/cell-type-ETP-ALL-03/renv.lock" | ||
- "analyses/cell-type-ETP-ALL-03/conda-lock.yml" | ||
push: | ||
branches: | ||
- main | ||
paths: | ||
- "analyses/cell-type-ETP-ALL-03/Dockerfile" | ||
- "analyses/cell-type-ETP-ALL-03/.dockerignore" | ||
- "analyses/cell-type-ETP-ALL-03/renv.lock" | ||
- "analyses/cell-type-ETP-ALL-03/conda-lock.yml" | ||
workflow_dispatch: | ||
inputs: | ||
push-ecr: | ||
description: "Push to AWS ECR" | ||
type: boolean | ||
required: true | ||
|
||
jobs: | ||
test-build: | ||
name: Test Build Docker Image | ||
if: github.event_name == 'pull_request' || (contains(github.event_name, 'workflow_') && !inputs.push-ecr) | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v3 | ||
|
||
- name: Build image | ||
uses: docker/build-push-action@v5 | ||
with: | ||
context: "{{defaultContext}}:analyses/cell-type-ETP-ALL-03" | ||
push: false | ||
cache-from: type=gha | ||
cache-to: type=gha,mode=max | ||
|
||
build-push: | ||
name: Build and Push Docker Image | ||
if: github.repository_owner == 'AlexsLemonade' && (github.event_name == 'push' || inputs.push-ecr) | ||
uses: ./.github/workflows/build-push-docker-module.yml | ||
with: | ||
module: "cell-type-ETP-ALL-03" | ||
push-ecr: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# This is a workflow to run the cell-type-ETP-ALL-03 module | ||
# | ||
# Analysis modules are run based on three triggers: | ||
# - Manual trigger | ||
# - On pull requests where code in the module has changed | ||
# - As a reusable workflow called from a separate workflow which periodically runs all modules | ||
# | ||
# At initialization, only the manual trigger is active | ||
|
||
name: Run cell-type-ETP-ALL-03 analysis module | ||
env: | ||
MODULE_PATH: analyses/cell-type-ETP-ALL-03 | ||
AWS_DEFAULT_REGION: us-east-2 | ||
|
||
concurrency: | ||
# only one run per branch at a time | ||
group: "run_cell-type-ETP-ALL-03_${{ github.ref }}" | ||
cancel-in-progress: true | ||
|
||
on: | ||
workflow_dispatch: | ||
workflow_call: | ||
pull_request: | ||
branches: | ||
- main | ||
paths: | ||
- analyses/cell-type-ETP-ALL-03/** | ||
- "!analyses/cell-type-ETP-ALL-03/Dockerfile" | ||
- "!analyses/cell-type-ETP-ALL-03/.dockerignore" | ||
- .github/workflows/run_cell-type-ETP-ALL-03.yml | ||
|
||
jobs: | ||
run-module: | ||
if: github.repository_owner == 'AlexsLemonade' | ||
runs-on: ubuntu-latest | ||
defaults: | ||
run: | ||
shell: bash -el {0} | ||
|
||
steps: | ||
- name: Checkout repo | ||
uses: actions/checkout@v4 | ||
|
||
- name: Set up R | ||
uses: r-lib/actions/setup-r@v2 | ||
with: | ||
r-version: 4.4.0 | ||
use-public-rspm: true | ||
|
||
- name: Set up pandoc | ||
uses: r-lib/actions/setup-pandoc@v2 | ||
|
||
- name: Install system dependencies | ||
run: | | ||
sudo apt-get install -y libcurl4-openssl-dev \ | ||
libhdf5-dev \ | ||
libglpk40 \ | ||
libxml2-dev \ | ||
libfontconfig1-dev \ | ||
libharfbuzz-dev \ | ||
libfribidi-dev \ | ||
libtiff5-dev | ||
- name: Set up renv | ||
uses: r-lib/actions/setup-renv@v2 | ||
with: | ||
working-directory: ${{ env.MODULE_PATH }} | ||
|
||
- name: Set up conda | ||
# Note that this creates and activates an environment named 'test' by default | ||
uses: conda-incubator/setup-miniconda@v3 | ||
with: | ||
miniforge-version: latest | ||
|
||
- name: Install conda-lock and activate locked conda environment | ||
run: | | ||
conda install conda-lock | ||
conda-lock install --name openscpca-cell-type-ETP-ALL-03 ${MODULE_PATH}/conda-lock.yml | ||
# Update this step as needed to download the desired data | ||
- name: Download test data | ||
run: | | ||
./download-data.py --projects SCPCP000003 --test-data --format SCE | ||
./download-results.py --projects SCPCP000003 --test-data --modules doublet-detection | ||
- name: Run analysis module | ||
run: | | ||
cd ${MODULE_PATH} | ||
# run module script(s) here | ||
Rscript scripts/00-01_processing_rds.R | ||
Rscript scripts/02-03_annotation.R | ||
Rscript scripts/multipanel_plot.R |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Don't activate renv in an OpenScPCA docker image | ||
if (Sys.getenv('OPENSCPCA_DOCKER') != 'TRUE') { | ||
source('renv/activate.R') | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Ignore everything by default | ||
* | ||
|
||
# Include specific files in the docker environment | ||
!/renv.lock | ||
!/requirements.txt | ||
!/environment.yml | ||
!/conda-lock.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Results should not be committed | ||
/results/* | ||
!/results/README.md | ||
|
||
# Ignore the scratch directory (but keep it present) | ||
/scratch/* | ||
!/scratch/.gitkeep |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
tissueType,cellName,ensembl_id_positive_marker,ensembl_id_negative_marker,fullName,ontologyID | ||
Immune system,B,"ENSG00000163534,ENSG00000132704,ENSG00000012124,ENSG00000138639,ENSG00000153064,ENSG00000156738,ENSG00000116191,ENSG00000104894,ENSG00000133789,ENSG00000105369",,B cell,CL_0000945 | ||
Immune system,CD4 T,"ENSG00000172005,ENSG00000138795,ENSG00000168685,ENSG00000081059,ENSG00000227507,ENSG00000104660,ENSG00000198851,ENSG00000167286,ENSG00000160654,ENSG00000139193",,CD4 T cell,CL_0000624 | ||
Immune system,CD8 T,"ENSG00000172116,ENSG00000153563,ENSG00000184613,ENSG00000167286,ENSG00000198851,ENSG00000160307,ENSG00000100450,ENSG00000160654,ENSG00000227191,ENSG00000271503",,CD8 T cell,CL_0000625 | ||
Immune system,DC,"ENSG00000198178,ENSG00000115718,ENSG00000070031,ENSG00000169432,ENSG00000105251,ENSG00000155367,ENSG00000168913,ENSG00000132514,ENSG00000239961,ENSG00000163687",,Dendritic cell,CL_0000451 | ||
Immune system,HSPC,"ENSG00000172995,ENSG00000186710,ENSG00000101200,ENSG00000163554,ENSG00000119888,ENSG00000188672,ENSG00000131016,ENSG00000112077,ENSG00000172247,ENSG00000170891",,Hematopoietic stem and progenitor cell,CL_0000037 | ||
Immune system,Mono,"ENSG00000197353,ENSG00000110203,ENSG00000166523,ENSG00000104974,ENSG00000158825,ENSG00000162444,ENSG00000186074,ENSG00000171051,ENSG00000125810,ENSG00000014914",,Monocytes,CL_0000576 | ||
Immune system,NK,"ENSG00000189430,ENSG00000198574,ENSG00000134545,ENSG00000150687,ENSG00000117281,ENSG00000156966,ENSG00000100385,ENSG00000115607,ENSG00000143184,ENSG00000150045",,Natural killer cell,CL_0000814 | ||
Immune system,Other T,"ENSG00000144290,ENSG00000111796,ENSG00000168685,ENSG00000215788,ENSG00000069667,ENSG00000107742,ENSG00000178573,ENSG00000113088,ENSG00000152518,ENSG00000145220",,Other T cell,CL_0000084 | ||
Immune system,Macrophage,"ENSG00000166211,ENSG00000121769,ENSG00000073754,ENSG00000275385,ENSG00000159189,ENSG00000173369,ENSG00000170323,ENSG00000173372,ENSG00000130203,ENSG00000250722",,Macrophage,CL_0000235 | ||
Immune system,Early Eryth,"ENSG00000119865,ENSG00000179348,ENSG00000005961,ENSG00000106327,ENSG00000102145,ENSG00000105610,ENSG00000170891,ENSG00000135525,ENSG00000075618,ENSG00000130208",,Early Erythrocyte,CL_0000764 | ||
Immune system,Late Eryth,"ENSG00000196188,ENSG00000112212,ENSG00000204010,ENSG00000188672,ENSG00000112077,ENSG00000163554,ENSG00000075340,ENSG00000119888,ENSG00000213934",,Late Erythrocyte,CL_0000764 | ||
Immune system,Plasma,"ENSG00000115884,ENSG00000222037,ENSG00000211640,ENSG00000048462,ENSG00000211673,ENSG00000240505,ENSG00000211685,ENSG00000167476,ENSG00000143297,ENSG00000243466",,Plasma cell,CL_0000786 | ||
Immune system,Platelet,"ENSG00000150681,ENSG00000187699,ENSG00000088726,ENSG00000169704,ENSG00000163737,ENSG00000163736,ENSG00000153071,ENSG00000113140,ENSG00000176783,ENSG00000124491",,Platelet,CL_0000233 | ||
Immune system,Stromal,"ENSG00000115461,ENSG00000047457,ENSG00000091513,ENSG00000011465,ENSG00000139329,ENSG00000164692,ENSG00000147571,ENSG00000041982,ENSG00000152583,ENSG00000112175",,Stromal cell,CL_0000499 | ||
Immune system,Blast,"ENSG00000002586,ENSG00000173762,ENSG00000124766,ENSG00000177606,ENSG00000117632,ENSG00000123416,ENSG00000167286",,Blast cell,CL_0000055 | ||
Immune system,Cancer,"ENSG00000026508,ENSG00000119888,ENSG00000141736,ENSG00000086205,ENSG00000111057,ENSG00000007062",,Cancer cell,CL_0001064 | ||
Immune system,Pre Eryth,"ENSG00000081237,ENSG00000170180,ENSG00000175792,ENSG00000072274,ENSG00000110195,ENSG00000135218,ENSG00000115232,ENSG00000244734,ENSG00000223609,ENSG00000133742",,Erythroid-like and erythroid precursor cell,CL_0000038 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
FROM bioconductor/r-ver:3.19 | ||
|
||
# Labels following the Open Containers Initiative (OCI) recommendations | ||
# For more information, see https://specs.opencontainers.org/image-spec/annotations/?v=v1.0.1 | ||
LABEL org.opencontainers.image.title="openscpca/cell-type-ETP-ALL-03" | ||
LABEL org.opencontainers.image.description="Docker image for the OpenScPCA analysis module 'cell-type-ETP-ALL-03'" | ||
LABEL org.opencontainers.image.authors="OpenScPCA [email protected]" | ||
LABEL org.opencontainers.image.source="https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/cell-type-ETP-ALL-03" | ||
|
||
# Set an environment variable to allow checking if we are in an OpenScPCA container | ||
ENV OPENSCPCA_DOCKER=TRUE | ||
|
||
# set a name for the conda environment | ||
ARG ENV_NAME=openscpca-cell-type-ETP-ALL-03 | ||
|
||
# set environment variables to install conda | ||
ENV PATH="/opt/conda/bin:${PATH}" | ||
|
||
# Install conda via miniforge | ||
# adapted from https://github.com/conda-forge/miniforge-images/blob/master/ubuntu/Dockerfile | ||
RUN curl -L "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" -o /tmp/miniforge.sh \ | ||
&& bash /tmp/miniforge.sh -b -p /opt/conda \ | ||
&& rm -f /tmp/miniforge.sh \ | ||
&& conda clean --tarballs --index-cache --packages --yes \ | ||
&& find /opt/conda -follow -type f -name '*.a' -delete \ | ||
&& find /opt/conda -follow -type f -name '*.pyc' -delete \ | ||
&& conda clean --force-pkgs-dirs --all --yes | ||
|
||
# Activate conda environments in bash | ||
RUN ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh \ | ||
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> /etc/skel/.bashrc \ | ||
&& echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc | ||
|
||
# Install conda-lock | ||
RUN conda install --channel=conda-forge --name=base conda-lock \ | ||
&& conda clean --all --yes | ||
|
||
# Install renv | ||
RUN Rscript -e "install.packages('renv')" | ||
|
||
# Disable the renv cache to install packages directly into the R library | ||
ENV RENV_CONFIG_CACHE_ENABLED=FALSE | ||
|
||
# Copy conda lock file to image | ||
COPY conda-lock.yml conda-lock.yml | ||
|
||
# restore from conda-lock.yml file and clean up to reduce image size | ||
RUN conda-lock install -n ${ENV_NAME} conda-lock.yml \ | ||
&& conda clean --all --yes | ||
|
||
# Copy the renv.lock file from the host environment to the image | ||
COPY renv.lock renv.lock | ||
|
||
# restore from renv.lock file and clean up to reduce image size | ||
RUN Rscript -e 'renv::restore()' \ | ||
&& rm -rf ~/.cache/R/renv \ | ||
&& rm -rf /tmp/downloaded_packages \ | ||
&& rm -rf /tmp/Rtmp* | ||
|
||
# Activate conda environment on bash launch | ||
RUN echo "conda activate ${ENV_NAME}" >> ~/.bashrc | ||
|
||
# Set CMD to bash to activate the environment when launching | ||
CMD ["/bin/bash"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# ETP T-ALL Annotation (SCPCP000003) | ||
|
||
This analysis module will include codes to annotate cell types and tumor/normal status in ETP T-ALL from SCPCP000003 (n=30) present on the ScPCA portal. | ||
|
||
## Description | ||
|
||
We first aim to annotate the cell types in ETP T-ALL, and use the annotated B cells in the sample as the "normal" cells to identify tumor cells, since T-ALL is caused by the clonal proliferation of immature T-cell [<https://www.nature.com/articles/s41375-018-0127-8>]. | ||
|
||
- We use the cell type marker (`Azimuth_BM_level1.csv`) from [Azimuth Human Bone Marrow reference](https://azimuth.hubmapconsortium.org/references/#Human%20-%20Bone%20Marrow). In total, there are 14 cell types: B, CD4T, CD8T, Other T, DC, Monocytes, Macrophages, NK, Early Erythrocytes, Late Erythrocytes, Plasma, Platelet, Stromal, and Hematopoietic Stem and Progenitor Cells (HSPC). Based on the exploratory analysis, we believe that most of the cells in these samples do not express adequate markers to be distinguished at finer cell type level (eg. naive vs memory, CD14 vs CD16 etc.), and majority of the cells should belong to T-cells. In addition, we include the marker genes for blast cell [[Bhasin et al. (2023)](https://www.nature.com/articles/s41598-023-39152-z)] as well as erythroid precursor and cancer cell in immune system [[ScType](https://sctype.app/database.php) database]. | ||
|
||
- Since ScType annotates cell types at cluster level using marker genes provided by user or from the built-in database, we employ [self-assembling manifold](https://github.com/atarashansky/self-assembling-manifold/tree/master) (SAM) algorithm, a soft feature selection strategy for better separation of homogeneous cell types. | ||
|
||
- After cell type annotation, we provide B cells as the normal cells in the sample, if there is any, to [CopyKat](https://github.com/navinlabcode/copykat), for identification of tumor cells. | ||
|
||
Here are the steps in the module: | ||
|
||
1. Generating a processed rds file for each sample using SAM (`scripts/00-01_processing_rds.R`) | ||
|
||
2. Annotating cell type using ScType and identifying tumor cells using CopyKat (`scripts/02-03_annotation.R`) | ||
|
||
## Usage | ||
|
||
Before running Rscripts in R or Rstudio, we first need to prepare the input files as shown in the next section, and run the following codes in the terminal for installing required libraries: | ||
|
||
``` | ||
#system packages installation | ||
sudo apt install libglpk40 | ||
sudo apt install libcurl4-openssl-dev #for Seurat | ||
sudo apt-get install libxml2-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev libtiff5-dev #for devtools | ||
conda-lock install --name openscpca-cell-type-ETP-ALL-03 conda-lock.yml | ||
Rscript -e "renv::restore()" | ||
``` | ||
|
||
## Input files | ||
|
||
The `scripts/00-01_processing_rds.R` requires the processed SingleCellExperiment objects (`_processed.rds`) and doublet-detection results (`_processed_scdblfinder.tsv`) from SCPCP000003. These files could be obtained from running the following codes: | ||
|
||
``` | ||
#run in terminal | ||
../../download-data.py --projects SCPCP000003 | ||
../../download-results.py --projects SCPCP000003 --modules doublet-detection | ||
``` | ||
|
||
As for the annotation, `scripts/02-03_annotation.R` requires cell type marker gene file, `Azimuth_BM_level1.csv`, as an input for ScType. This excel file contains a list of positive marker genes in Ensembl ID under `ensembl_id_positive_marker` for each cell type; *TMEM56* and *CD235a* are not detected in our dataset, thus they are being removed as part of the markers for Late Eryth and Pre Eryth respectively. As of now, there is no negative marker genes provided under `ensembl_id_negative_marker`. | ||
|
||
## Output files | ||
|
||
Running `scripts/00-01_processing_rds.R` will generate two types of output: | ||
|
||
- `rds` objects in `scratch/` | ||
|
||
- umap plots showing leiden clustering in `plots/` | ||
|
||
Running `scripts/02-03_annotation.R` will generate several outputs: | ||
|
||
- updated `rds` objects in `scratch/` | ||
|
||
- umap plots showing cell type and CopyKat prediction (if there is any) and dotplots showing the features added with `AddModuleScore()` in `plots/` | ||
|
||
- ScType results of top 10 possible cell types in a cluster (`_sctype_top10_celltypes_perCluster.txt`) and metadata file tabulating leiden cluster, cell type, low confidence cell type, and CopyKat prediction for each cell (`_metadata.txt`) in `results/` | ||
|
||
## Software requirements | ||
|
||
To run the analysis, execute the Rscript in R or Rstudio (version 4.4.0). The main libraries used are: | ||
|
||
- Seurat (version 5.1.0) | ||
|
||
- reticulate (version 1.39.0) | ||
|
||
- sam-algorithm (in python) | ||
|
||
- ScType | ||
|
||
- CopyKat | ||
|
||
The renv.lock file contains all packages and version information. All python libraries are installed in the conda environment `openscpca-cell-type-ETP-ALL-03`, and the python codes are executed in the same environment by running them in R via `reticulate`. To create and activate this environment from `.yml` file use: | ||
|
||
``` | ||
conda-lock install --name openscpca-cell-type-ETP-ALL-03 conda-lock.yml | ||
``` | ||
|
||
## Computational resources | ||
|
||
All the commands above are currently executed in the standard 4XL virtual machine via AWS Lightsail for Research, but it runs pretty slow for CopyKat with one computational core. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Version: 1.0 | ||
|
||
RestoreWorkspace: Default | ||
SaveWorkspace: Default | ||
AlwaysSaveHistory: Default | ||
|
||
EnableCodeIndexing: Yes | ||
UseSpacesForTab: Yes | ||
NumSpacesForTab: 2 | ||
Encoding: UTF-8 | ||
|
||
RnwWeave: Sweave | ||
LaTeX: pdfLaTeX |
Oops, something went wrong.