Skip to content

Commit

Permalink
Merge pull request #573 from dianichj/pseudo-bulk-edger-workflow
Browse files Browse the repository at this point in the history
Add files for pseudobulk workflow using decoupler and edgeR
  • Loading branch information
bgruening authored Nov 16, 2024
2 parents ff84fd2 + 648462d commit 2fd1953
Show file tree
Hide file tree
Showing 5 changed files with 1,136 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /pseudo-bulk_edgeR.ga
testParameterFiles:
- /pseudo-bulk_edgeR-tests.yml
authors:
- name: Diana Chiang Jurado
orcid: 0000-0002-5857-1477
- name: Pavankumar Videm
orcid: 0000-0002-5192-126X
- name: Pablo Moreno
orcid: 0000-0002-9856-1679
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Changelog

## [0.1] 2024-10-14
First release.
44 changes: 44 additions & 0 deletions workflows/scRNAseq/pseudobulk-worflow-decoupler-edger/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Pseudobulk-edgeR workflows

This workflow uses the decoupler tool in Galaxy to generate pseudobulk counts from an annotated AnnData file obtained from scRNA-seq analysis. Following the pseudobulk step, differential expression genes (DEG) are calculated
using the edgeR tool. The workflow also includes data sanitation steps to ensure smooth operation of edgeR and minimizing potential issues. Additionally, a Volcano plot tool is used to visualize the results after the DEG
analysis.

## Inputs

- deCoupler: Source AnnData (`h5ad`).
- Parameter: Pseudobulk: Fields to merge / optional
- Parameter: Group by column / has to be given
- Parameter: Sample key column / has to be given
- Parameter: Name your raw count layer / has to be given
- Parameter: Factor Field / has to be given
- edgeR:
- Sanitzed Count Matrix
- Sanitized Factor File
- Cleaned Gene Annotations file
- Parameter: Formula for linear model / has to be given
- Contrast file / has to be given
- Volcano Plot:
- Input (`tabular`) file with genesymbol, logFC, Pvalue and FDR columns.

## Processing

Sanitzation steps after decoupler:
- Sanitize Matrix and Factors(`tabular`): finds [ --+*^]+ and replace with -
- Remove start, end with (`tabular`): A column that may affect EdgeR and DESeq2.
- Sanitize First Factor for leading digits (`tabular`): Finds ^([0-9])(.+) and replace it with GG_\\1\\2
- Get Contrast labels
- Replace text
- Split Contrasts
- Contrasts as Parameters: Plot title
- Select columns for volcano plot using (`Remove columns`) from DEG edgeR (`Table`)output.


## Outputs

- Pseudobulk_count_matrix (`tabular`)
- Pseudobulk Plot (`png`)
- Filtered by expression (`png`)
- Table DEG
- Results (`HTML`) File and plots for download within the output as (`png`)
- Volcano plot (`PDF`)
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
- doc: Test outline for pseudo-bulk_edgeR
job:
Source AnnData file:
class: File
location: https://zenodo.org/records/13929549/files/Source%20AnnData%20file.h5ad
filetype: h5ad
'Pseudo-bulk: Fields to merge': null
Group by column: cell_type
Sample key column: individual
Name Your Raw Counts Layer: counts
Factor fields: disease
Gene symbol column: gene_symbol
Formula: '~ 0 + disease'
outputs:
'Pseudobulk count matrix':
has_text_matching:
expression: "ACAP2\t9.0\t18.0\t20.0\t68.0\t106.0\t122.0\t14.0\t259.0\t279.0\t184.0\t612.0\t293.0\t297.0\t46.0\t1.0\t0.0\t1.0\t12.0\t229.0\t151.0\t141.0\t309.0\t299.0\t181.0\t2.0\t2.0\t28.0\t15.0\t54.0\t210.0\t1.0\t1.0\t1.0\t11.0"
expression: "ACER3\t4.0\t25.0\t21.0\t110.0\t82.0\t91.0\t22.0\t326.0\t297.0\t211.0\t1004.0\t574.0\t370.0\t108.0\t0.0\t0.0\t2.0\t2.0\t188.0\t113.0\t135.0\t322.0\t324.0\t159.0\t7.0\t7.0\t32.0\t5.0\t33.0\t89.0\t2.0\t2.0\t8.0\t48.0"
'Pseudobulk Plot':
element_test:
has_size: 40116
delta: 2000
'Filtered by expression':
element_test:
has_size: 23490
delta: 2000
'Report Results: HTML File':
element_test:
has_size: 531761
delta: 25000
'Tables: DEG':
element_tests:
edgeR_normal-COVID_19:
has_text_matching:
expression: "RALBP1\tENSG00000017797\tFalse\t0.518[0-9]*\t1.609[0-9]*\t0.402[0-9]*\t2\tFalse\t0.286[0-9]*\t0.552[0-9]*\t-1.967[0-9]*\t7.483[0-9]*\t12.0213[0-9]*\t0.001[0-9]*\t0.436[0-9]*"
expression: "NAPA\tENSG00000105402\tTrue\t0.342[0-9]\t1.686[0-9]\t0.846[0-9]\t4\tFalse\t0.180[0-9]\t0.440[0-9]\t-1.059[0-9]\t6.833[0-9]\t3.291[0-9]\t0.076[0-9]\t0.619[0-9]"
has_n_lines:
n: 1430
delta: 1
'Tables for volcano plot':
element_tests:
edgeR_normal-COVID_19:
has_text_matching:
expression: "CPEB4\t-2.402[0-9]\t0.001[0-9]\t0.436[0-9]"
expression: "FGFR1OP2\t-2.367[0-9]\t0.004[0-9]\t0.458[0-9]"
'Volcano Plot on input dataset(s): PDF':
element_tests:
edgeR_normal-COVID_19:
has_size: 85052
delta: 2000
Loading

0 comments on commit 2fd1953

Please sign in to comment.