COVID19
Download most recent data from NCBI into the file data/sequences.fa
Create a directory called gene-fastas
. Run
python3 split-by-reference.py data/sequences.fa
to create a fasta for each reference gene identified in sequences.fa
. Reference genes are found by the YP_
prefix to the accession.
The linux binary clustalo
is in the repo, but it can be downloaded fresh using
wget http://www.clustal.org/omega/clustalo-1.2.4-Ubuntu-x86_64 -O clustalo
Create a directory called aligns
, and create alignments with ./make-aligns.sh
Create a directory called idmats
, and create identity matrices with ./make-idmats.sh