- Download a database of human RNA-binding proteins and their PWMs
- Download the viral reference genome
- Scan the viral genome for these PWMs.
- Annotate potential RBP binding sites on the viral genome
- Check for expression/activity of these RBPs in relevant cell types
- Check whether any of the RBP binding sites are modified in different isolates of SARS-CoV-2
- Check whether these RBP binding sites existed in closely related viruses, e.g. the bat coronavirus sequences similar to SARS-CoV-2.
https://github.com/avantikalal/covid-gene-expression/projects/2
- RBPDB, a database of RNA-binding proteins and PWMs: http://rbpdb.ccbr.utoronto.ca/index.php
- SARS-CoV-2 reference genome: https://www.ncbi.nlm.nih.gov/nuccore/NC_045512
- Download the reference genome for SARS-CoV-2 from NCBI: https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.
- Download data on human RNA-binding proteins from RBPDB: http://rbpdb.ccbr.utoronto.ca/downloads/RBPDB_v1.3.1_human_2012-11-21_TDT.zip. Unzip the downloaded directory to get the data tables.
- Download PWMs (Position Weight Matrices) for these human RNA-binding proteins from RBPDB: http://rbpdb.ccbr.utoronto.ca/downloads/matrices_human.zip. This download contains both PWMs and PFMs. PWM files are in the
PWMDir
folder.
See find_binding_sites.R
for a basic script to find RBP-binding sites on the SARS-CoV-2 genome. To run this script you need to modify the first section with paths to your downloaded data files.
The script is in R and uses the following libraries:
- data.table
- TFBSTools
- seqinr
- Biostrings
A docker file with all of the R dependencies is available at hpobiolab/rbp-pwm-r
sitesets.RData
is the table of potential binding sites produced by find_binding_sites.R
. To open this, type in an R terminal:
load("sitesets.RData")