Skip to content

A workflow to extract the spike gene sequences from SARS-CoV-2 full genome sequence.

Notifications You must be signed in to change notification settings

mikemwanga/Extract_Genes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extract_spike_gene

This workflow extracts spike (glycoprotein) gene portions from SARS-CoV-2 whole genome sequences. Same principle can be applied while extracting gene sequences from longer genomes.

Tools

  1. blast command-line tool. Download from here
  2. Python 3.9.5

Test Data

The test dataset contains a set of SARS-CoV-2 complete genome sequences downloaded from GISAID database. This multi-fasta file is referred as seqfile in the subsequent steps. The query file contains a single fasta sequence - in here the surface glycoprotein gene sequence from SARS-CoV-2 Wuhan strain

Workflow

  1. Create a database of the whole genome sequences.
  2. Align spike query sequence to blast database. Use mapping coordinates to extract the target gene sequences from database.

Steps

Lauch the workflow console within LatchBio using this link. Create an account once prompted.

  1. Load a single spike gene sequence as query_sequence
  2. Load metafile containing full genome sequences as seq_data
  3. Execute the pipeline

Link to the code here

Maintenance

About

A workflow to extract the spike gene sequences from SARS-CoV-2 full genome sequence.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published