COVID19

Getting Data

Download most recent data from NCBI into the file data/sequences.fa

Create a directory called gene-fastas. Run

python3 split-by-reference.py data/sequences.fa

to create a fasta for each reference gene identified in sequences.fa. Reference genes are found by the YP_ prefix to the accession.

The linux binary clustalo is in the repo, but it can be downloaded fresh using

wget http://www.clustal.org/omega/clustalo-1.2.4-Ubuntu-x86_64 -O clustalo

Create a directory called aligns, and create alignments with ./make-aligns.sh

Create a directory called idmats, and create identity matrices with ./make-idmats.sh

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Doc		Doc
data		data
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
clustalo		clustalo
common.py		common.py
extract-accs.py		extract-accs.py
extractByGene.py		extractByGene.py
make-aligns.sh		make-aligns.sh
make-idmats.sh		make-idmats.sh
match-genes.py		match-genes.py
split-by-reference.py		split-by-reference.py