-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
158 lines (106 loc) · 4.86 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
title: "crisprBowtie: alignment of gRNA spacer sequences using bowtie"
output:
github_document:
toc: true
bibliography: vignettes/references.bib
---
Authors: Jean-Philippe Fortin
Date: July 13, 2022
# Overview of crisprBowtie
`crisprBowtie` provides two main functions to align short DNA sequences to
a reference genome using the short read aligner bowtie [@langmead2009bowtie]
and return the alignments as R objects: `runBowtie` and `runCrisprBowtie`.
It utilizes the Bioconductor package `Rbowtie` to access the Bowtie program
in a platform-independent manner. This means that users do not need to install
Bowtie prior to using `crisprBowtie`.
The latter function (`runCrisprBowtie`) is specifically designed
to map and annotate CRISPR guide RNA (gRNA) spacer sequences using
CRISPR nuclease objects and CRISPR genomic arithmetics defined in
the Bioconductor package
[crisprBase](https://github.com/crisprVerse/crisprBase).
This enables a fast and accurate on-target and off-target search of
gRNA spacer sequences for virtually any type of CRISPR nucleases.
It also provides an off-target search engine for our main gRNA design package [crisprDesign](https://github.com/crisprVerse/crisprDesign) of the
[crisprVerse](https://github.com/crisprVerse) ecosystem. See the
`addSpacerAlignments` function in `crisprDesign` for more details.
# Installation and getting started
## Software requirements
### OS Requirements
This package is supported for macOS, Linux and Windows machines.
Package was developed and tested on R version 4.2.1.
## Installation from Bioconductor
`crisprBowtie` can be installed from from the Bioconductor devel branch
using the following commands in a fresh R session:
```{r, eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version="devel")
BiocManager::install("crisprBowtie")
```
The complete documentation for the package can be found [here](https://bioconductor.org/packages/devel/bioc/manuals/crisprBowtie/man/crisprBowtie.pdf).
# Building a bowtie index
To use `runBowtie` or `runCrisprBowtie`, users need to first build a Bowtie
genome index. For a given genome, this step has to be done only once.
The `Rbowtie` package conveniently provides the function `bowtie_build`
to build a Bowtie index from any custom genome from a FASTA file.
As an example, we build a Bowtie index for a small portion of the human
chromosome 1 (`chr1.fa` file provided in the `crisprBowtie` package) and
save the index file as `myIndex` to a temporary directory:
```{r}
library(Rbowtie)
fasta <- file.path(find.package("crisprBowtie"), "example/chr1.fa")
tempDir <- tempdir()
Rbowtie::bowtie_build(fasta,
outdir=tempDir,
force=TRUE,
prefix="myIndex")
```
To learn how to create a Bowtie index for a complete genome or transcriptome,
please visit our [tutorial page](https://github.com/crisprVerse/Tutorials/tree/master/Building_Genome_Indices).
# Alignment using `runCrisprBowtie`
As an example, we align 6 spacer sequences (of length 20bp) to the
custom genome built above, allowing a maximum of 3 mismatches between the
spacer and protospacer sequences.
We specify that the search is for the wildtype Cas9 (SpCas9) nuclease
by providing the `CrisprNuclease` object `SpCas9` available through the
`crisprBase` package. The argument `canonical=FALSE` specifies that
non-canonical PAM sequences are also considered (NAG and NGA for SpCas9).
The function `getAvailableCrisprNucleases` in `crisprBase` returns a character
vector of available `crisprNuclease` objects found in `crisprBase`.
```{r, warning=FALSE, message=FALSE}
library(crisprBowtie)
data(SpCas9, package="crisprBase")
crisprNuclease <- SpCas9
spacers <- c("TCCGCGGGCGACAATGGCAT",
"TGATCCCGCGCTCCCCGATG",
"CCGGGAGCCGGGGCTGGACG",
"CCACCCTCAGGTGTGCGGCC",
"CGGAGGGCTGCAGAAAGCCT",
"GGTGATGGCGCGGGCCGGGC")
runCrisprBowtie(spacers,
crisprNuclease=crisprNuclease,
n_mismatches=3,
canonical=FALSE,
bowtie_index=file.path(tempDir, "myIndex"))
```
# Applications beyond CRISPR
The function `runBowtie` is similar to `runCrisprBowtie`,
but does not impose constraints on PAM sequences.
It can be used to search for any short read sequence in a genome.
## Example using RNAi (siRNA design)
Seed-related off-targets caused by mismatch tolerance outside of the
seed region is a well-studied and characterized problem observed in RNA
interference (RNA) experiments. `runBowtie` can be used to map shRNA/siRNA seed
sequences to reference genomes to predict putative off-targets:
```{r, eval=TRUE}
seeds <- c("GTAAAGGT", "AAGGATTG")
runBowtie(seeds,
n_mismatches=2,
bowtie_index=file.path(tempDir, "myIndex"))
```
# Reproducibility
```{r}
sessionInfo()
```
# References