This is a cloud-based pipeline that uses the job submission tool dsub to run stratified LD-Score Regression and MAGMA in a parallelized way. The pipeline is split into two scripts, and
The pipeline run on VMs created by dsub
and require a Dockerfile
to be loaded on each VM. The docker file can be built as follow:
docker build --no-cache -t .
gcloud docker -- push
# We made it publically available
gsutil iam ch allUsers:objectViewer gs://
Flags for
This flag accepts as input a list of genes over which you want to partition heritability.
This gene list will be converted into a per snp annotation. The file can have a single
column indicating it will be a binary annotation or contain a secondary column that
is a quantitative annotation for each gene.
This flag accepts as input a list of rsids over which you want to partition heritability.
The file can have a single column indicating it will be a binary annotation or contain a
secondary column that is a quantitative annotation for each SNP.
This flag accepts a path to a folder that contains pre-calculated ldscores for an
annotation. This ldscores will be run directly into the regression portion of the
pipeline to produce partitioned heritability results.
This flag accepts a file that is in UCSC bed file format for regions over which
you want to partition heritability. There can be a 4th column that is a continuous
This flag accepts a file that has two columns, the first is the prefix for your ldscores for a geneset,
and the second is the google bucket path to the corresponding geneset. One geneset per line. This allows
the user to take advantage of the --cts flags within LDSC software to run many genesets on one VM.
This flag accepts a file that has two columns, the first is the prefix of the ldscores for a geneset,
and the second the google bucket path to the corresponding ldscores. On set of ldscores per line.
This allows the user to take advantage of the --cts flags within LDSC when you already have
ldscores calculated. E.g. test_analyses.GeneSet1 gs://test_analyses/ldscores/test_analyses.GeneSet1.*
These flags work the same as the --main-annot-* flags but are used when you want
to condition the regression on another annotation.
This flag allows you to just calculate ldscores for a particular annotation.
If given, this flag will prevent any regression from being run.
A comma separated list of summary statistics files ending in .sumstats.gz that
have already been processed using
Prefix that will be used when naming ldscore files and regression output files.
Path to folder to save regression results to.
Path to folder to save ldscores to. If given this flag will copy the ldscores to
the path, if not ldscore files will not be written out.
Path to file that has gene coordinates. Format is GENE CHR START END including the header.
If not using the default (ENSGID based) file, you need to include --gene-col-name flag
to indicate what the first column of your --gene-coord-file is called.
e.g if your --gene-coord-file is headed as such: ENTREZ CHR START END you would indicate
--gene-col-name ENTREZ
Steps to run the pipeline:
- Prepare a tab-separated file containing the inputs for the
command. See an example in/example/submit_list_example.tsv
. These environmental variables are then read in by the script called bydsub
as explained below. Depending on your analysis, these fields can change but below is an example:
--env INPUT_MAIN - This will provide your main annotation, can be path to gene list, rsids,
ldscore folder, bed file or ldcts file with genesets or ldscores.
In this example it is a gene list.
--env INPUT_SUMSTAT - List of comma-separated files (already processed with
where to apply partition LDscore.
--env PREFIX - Prefix for the ldscores files that will be created and the results file
from the regression.
--env OUT - Path to save the regression results
gs://singlecellldscore/example/example.geneset gs://singlecellldscore/example/asd_summary_stats.sumstats.gz,gs://singlecellldscore/example/scz_summary_stats.sumstats.gz example gs://singlecellldscore/example/
- Build a
command to run the analysis. One example is provided inexample/
The code should look something like this:
a) Assign the enviromental variables defined in the file created in step 2.
INPUT_MAIN = os.environ['INPUT_MAIN']
PREFIX = os.environ['PREFIX']
OUT = os.environ['OUT']
b) Call the
script, for example:['/home/sc_enrichement/sc_enrichement-master/',
There are many other options that can be used. For another example check example/
if you have not done yetpip install dsub
command, similar to this:
dsub \
--provider google \
--project ldscore-data \
--zones "us-central1-*" \
--min-ram 4 \
--min-cores 4 \
--logging gs://singlecellldscore/example/log/ \
--disk-size 100 \
--image \
--tasks example/submit_list_example.tsv \
--script example/
sometime dsub
does not recognize the google cloud credential, then you have to export GOOGLE_APPLICATION_CREDENTIALS="your_google_cloud_service_account_key_file.json"
Check the example/
folder for other examples of submissions programs.