Annotation and MAPS-based analysis of splicing prediction scores

Data
1. Import the scores into Hail
2. Add annotations for MAPS
3. Group variants and calculate the number of singletons
Analysis
1. Download grouped variants
2. Calculate MAPS for each score-group combination
3. Visualise the results

Tested with Hail version 0.2.107 and Snakemake 7.32

The "Data" pipeline

data/ contains Hail and Snakemake code that requires execution in Google Cloud and saves files to a Google Storage (GS) bucket.

How to run

Create a new cluster: hailctl dataproc start <cluster_name> --packages snakemake --requester-pays-allow-buckets gnomad-public-requester-pays --project <project_name> --bucket <bucket_name> --region <region> --num-workers <N> --image-version=2.0.27-debian10
Connect to the cluster: gcloud beta compute ssh <user_name>@<cluster_name>-m --project "<project_name>"
git clone this repository and navigate to data/
Run the pipeline: snakemake --cores all --configfile config.yaml --config gcp_rootdir="<bucket_name>/some_directory/"

Alternatively, in Step 4 you can submit the pipeline as a job. Create job.py containing the following:

import snakemake
snakemake.main(
	[
		"--snakefile",
		"/path/to/Snakefile",
		"--cores",
		"all",
		"--configfile",
		"/path/to/config.yaml",
		"--config",
		'gcp_rootdir="<bucket_name>/some_directory/"',
	]
)

Submit the script with hailctl dataproc submit <cluster_name> job.py

The "Analysis" pipeline

analysis/ contains scripts that calculate and visualise MAPS scores using files created in data/.

How to run

git clone the code for CAPS (https://github.com/VCCRI/CAPS) into the same root directory as MAPS_for_splicing/
Navigate to MAPS_for_splicing/analysis/
snakemake --cores all --config gcp="True" gcp_rootdir="<bucket_name>/some_directory/" (slower; will download the GS files each time) or snakemake --cores all --config gcp="False" (faster; will re-use the contents of files/; create the directory and download the required files from GS into it if necessary)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
analysis		analysis
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Annotation and MAPS-based analysis of splicing prediction scores

The "Data" pipeline

How to run

The "Analysis" pipeline

How to run

About

Releases

Packages

Contributors 2

Languages

License

mgudVCCRI/MAPS_for_splicing

Folders and files

Latest commit

History

Repository files navigation

Annotation and MAPS-based analysis of splicing prediction scores

The "Data" pipeline

How to run

The "Analysis" pipeline

How to run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages