sarscov2-tools

Scripts for analysing SARS-CoV-2 nucleotide sequences

This repository contains scripts that are useful for calculating average genetic distance, wu-kabat variability coefficient and lineage frecuency per 2020 week, among others

Python packages

pandas
numpy
argparse
collections
glob
re
os

Average genetic distance

average_distance.py determines average genetic distance and standard deviation of a symmetric matrix. This matrix must be generated with snp-dists (pairwise SNP distance matrix from a FASTA sequence alignment). snp-dists output is average_distance.py input. Usage:

average_distance.py snp-dists-matrix

Coexisting lineages per week

coexisting_linages.py outputs a table (.tsv) with the frecuency of every lineage per week.

coexisting_lineages.py metadata [-f] [-p]

metadata  tsv with (at least) 'Lineage' and 'date' columns
-f  outputs .tsv with the lineages' earliest date
-p  outputs .tsv of lineages per Andalusian province

Mutations frecuency per week

cov_cluster.py outputs a table with the frecuency per week of a specific amino acid mutation of protein S.

cov_cluster.py metadata directory1 position

metadata  tsv with (at least) 'Strain' and 'date' columns
directory1  /complete/directory/to/*_analysis_report.csv
position  comma separated amino acids positions of protein S

Mutations frecuency per province

cov_provincias.py outputs a table with the frecuency of a specific mutation of protein S per week per Andalusian province.

cov_provincias.py metadata directory1 position [-n]

metadata  tsv with (at least) 'Strain', 'date' and 'Province' columns
directory1  /complete/directory/to/*_analysis_report.csv
position  comma separated amino acid position of protein S
-n  outputs table with total number of samples per week per province

Mutations per sample and frecuency of mutations

frec_snp_gtc.py performs variant calling from .tsv files (output of iVar. Only analyse mutations with a frecuency >=0.75 of sequences with a coverage >=0.95.

frec_snp_gtc.py  directory1 [-t]

directory1  /complete/directory/to/*_analysis_report.csv
-t  outputs .tsv file of nucleotide substitution type frecuency (for example, C>T frecuency)

Synonymous and non-synonymous mutations

tabla_Zekri_aa.py analyse mutations per gene of SARS-CoV-2.

tabla_Zekri_aa.py  directory1 [-a]

directory1  /complete/directory/to/*_analysis_report.csv
-a  adds a columns of amino acids alterations to .tsv file

Wu-Kabat variability coefficient

wu_kabat_gtc.py calculates Wu-Kabat variability coefficient and outputs a .tsv file for every protein.

wu_kabat_gtc.py  directory1 [-m] [-p]

directory1  /complete/directory/to/*_analysis_report.csv
-m  outputs a single .tsv file with all amino acid mutations 
-p  outputs a .tsv file for every protein with variability coefficient even if there is no variability in that amino acid.

Bibliography

Wu, T. T. & Kabat, E. A. An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. J. Exp. Med. 132, 211–250 (1970).

Zekri, A. N. et al. Genomic characterization of SARS-CoV-2 in Egypt. J. Adv. Res. (2020).

Owner note

Some scripts are aimed to Andalusian (from Andalusia, Spain) sequences because they were created as part of my Thesis proyect: Genomic and population diversity analysis of SARS-CoV-2 during first epidemic wave in Andalusia

🍃

LinkedIn 🔗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sarscov2-tools

Python packages

Average genetic distance

Coexisting lineages per week

Mutations frecuency per week

Mutations frecuency per province

Mutations per sample and frecuency of mutations

Synonymous and non-synonymous mutations

Wu-Kabat variability coefficient

Bibliography

Owner note

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
average_distance.py		average_distance.py
coexisting_lineages.py		coexisting_lineages.py
cov_cluster.py		cov_cluster.py
cov_provincias.py		cov_provincias.py
frec_snp_gtc.py		frec_snp_gtc.py
tabla_Zekri_aa.py		tabla_Zekri_aa.py
wu_kabat_gtc.py		wu_kabat_gtc.py

mlarjim/sarscov2-tools

Folders and files

Latest commit

History

Repository files navigation

sarscov2-tools

Python packages

Average genetic distance

Coexisting lineages per week

Mutations frecuency per week

Mutations frecuency per province

Mutations per sample and frecuency of mutations

Synonymous and non-synonymous mutations

Wu-Kabat variability coefficient

Bibliography

Owner note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages