-
Notifications
You must be signed in to change notification settings - Fork 1
Find and load leaders
This section is also covered in Initial data loading.
-
Important! Identifying the leader regions is needed to get properly oriented (by strand) spacer sequences for spacer blasting. So, do this before attempting spacer blasting if you want to investige PAMs, spacer-protospacer mismatches, or anything else involving proper orientation (strand) of the protospacer.
-
The potential leader regions can simply be defined as the regions adjacent to the CRISPR array. By default, the direct repeat degeneracies in the arrays are used to help narrow down the leader region (this assumes that direct repeats are most conserved near the leader, and that degeneracies exist in the array).
-
By default, potential leader regions will only span from the CRISPR array to either the max possible length of the leader region (1000 bp) or the beginning of the closest gene (this assumes leaders don't extent into genes).
CLdb_getLeaderRegions.pl -d CLdb.sqlite > possible_leaders.fna
CLdb_getLeaderRegions.pl -d CLdb.sqlite -q "AND subtype='I-B'" > leaders_IB.fna
Align the leaders using mafft or another sequence aligner.
mafft --adjustdirection possible_leaders.fna > possible_leaders_aln.fna
-
If 2 leaders written for a locus (ie. both 3' & 5' end), remove the 1 that does not align
-
View the alignment (via Jalview, Geneious, etc.); determine where leader conservation ends
-
For example: conservation ends 50bp from end of alignment (make a note: 50bp)
-
this will be trimmed off of the leader region when added to CLdb so just the conserved region up to the CRISPR array will be added to CLdb.
-
Both the aligned and unaligned sequenced are needed because mafft can alter orientation during alignment (--adjustdirect)
CLdb_loadLeaders.pl -d CLdb.sqlite -t 50 possible_leaders.fna possible_leaders_aln.fna
- '-t 50' = trim off the last 50bp of unconserved sequence in the alignment furthest from CRISPR array
CLdb_groupLeaders.pl -d CLdb.sqlite