Skip to content

Commit

Permalink
newreference: If gene is given but not found in genbank, error out
Browse files Browse the repository at this point in the history
Currently, if the --gene option is used and the gene name is not found,
the script will use the entire genome which may cause some silent undesired
behaviors.

This commit changes that such that the script will error out if the gene
is not found in the GenBank file as this indicates the gene name may be
misspelled or the user may be using the wrong GenBank file.

If the --gene option is not used, the script will continue to process the
entire genome as expected.

Co-authored-by: Jover Lee <[email protected]>
  • Loading branch information
j23414 and joverlee521 committed Mar 12, 2024
1 parent 570daa6 commit 8cd6a13
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions scripts/newreference.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from Bio.SeqFeature import SeqFeature, FeatureLocation, Seq
import shutil
import argparse
import sys

def new_reference(referencefile, outgenbank, outfasta, gene):
ref = SeqIO.read(referencefile, "genbank")
Expand All @@ -17,6 +18,12 @@ def new_reference(referencefile, outgenbank, outfasta, gene):
startofgene = int(list(feature.location)[0])
endofgene = int(list(feature.location)[-1])+1

# If user provides a --gene 'some name' is not found, print a warning and use the entire genome.
# Otherwise do not print a warning.
if(gene is not None and startofgene is None and endofgene is None):
print(f"ERROR: No '{gene}' was found under 'gene' or 'CDS' features in the GenBank file.", file=sys.stderr)
sys.exit(1)

record = ref[startofgene:endofgene]
source_feature = SeqFeature(FeatureLocation(start=0, end=len(record)), type='source',
qualifiers=ref_source_feature.qualifiers)
Expand Down

0 comments on commit 8cd6a13

Please sign in to comment.