Skip to content

kmoad/COVID19

 
 

Repository files navigation

COVID19

COVID19

Getting Data

Download most recent data from NCBI into the file data/sequences.fa

Extracting gene fastas

Create a directory called gene-fastas. Run

python3 split-by-reference.py data/sequences.fa

to create a fasta for each reference gene identified in sequences.fa. Reference genes are found by the YP_ prefix to the accession.

Multiple Alignment

The linux binary clustalo is in the repo, but it can be downloaded fresh using

wget http://www.clustal.org/omega/clustalo-1.2.4-Ubuntu-x86_64 -O clustalo

Create a directory called aligns, and create alignments with ./make-aligns.sh

Building identity matrix

Create a directory called idmats, and create identity matrices with ./make-idmats.sh

About

COVID19

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.7%
  • Shell 9.3%