Skip to content
This repository has been archived by the owner on Aug 10, 2022. It is now read-only.

Find and load leaders

nyoungb2 edited this page Sep 20, 2013 · 3 revisions

This section is also covered in Initial data loading.

  • Important! Identifying the leader regions is needed to get properly oriented (by strand) spacer sequences for spacer blasting. So, do this before attempting spacer blasting if you want to investige PAMs, spacer-protospacer mismatches, or anything else involving proper orientation (strand) of the protospacer.

  • The potential leader regions can simply be defined as the regions adjacent to the CRISPR array. By default, the direct repeat degeneracies in the arrays are used to help narrow down the leader region (this assumes that direct repeats are most conserved near the leader, and that degeneracies exist in the array).

  • By default, potential leader regions will only span from the CRISPR array to either the max possible length of the leader region (1000 bp) or the beginning of the closest gene (this assumes leaders don't extent into genes).

getting potential leader regions

CLdb_getLeaderRegions.pl -d CLdb.sqlite > possible_leaders.fna

getting potential leader regions for just 1 subtype

CLdb_getLeaderRegions.pl -d CLdb.sqlite -q "AND subtype='I-B'" > leaders_IB.fna

identifying leaders from potentials (possible_leaders.fna)

Align the leaders using mafft or another sequence aligner.

mafft --adjustdirection possible_leaders.fna > possible_leaders_aln.fna
  • If 2 leaders written for a locus (ie. both 3' & 5' end), remove the 1 that does not align

  • View the alignment (via Jalview, Geneious, etc.); determine where leader conservation ends

    • For example: conservation ends 50bp from end of alignment (make a note: 50bp)

    • this will be trimmed off of the leader region when added to CLdb so just the conserved region up to the CRISPR array will be added to CLdb.

loading identified leader regions

Both the aligned and unaligned sequenced are needed because mafft can alter orientation during alignment (--adjustdirect)

CLdb_loadLeaders.pl -d CLdb.sqlite -t 50 possible_leaders.fna possible_leaders_aln.fna
  • '-t 50' = trim off the last 50bp of unconserved sequence in the alignment furthest from CRISPR array

grouping leaders (100% sequence identity)

CLdb_groupLeaders.pl -d CLdb.sqlite