-
Notifications
You must be signed in to change notification settings - Fork 24
BioPython
Meg Staton edited this page Oct 21, 2020
·
4 revisions
Biopython
module load biopython/1.72
Example script 1 - Lets create a sequence object and do some common sequence manipulations.
from Bio.Seq import Seq
my_seq = Seq("AGTACACTGGT")
print(my_seq)
print(my_seq.complement())
print(my_seq.reverse_complement())
print(my_seq)
Example script 2 - We'll parse through the sequences in a fasta file and print their lengths. Get the fasta file first.
On command line:
wget https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/ls_orchid.fasta
How many sequences does it have?
grep -c '^>' ls_orchid.fasta
And the script:
from Bio import SeqIO
for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"):
print(seq_record.id)
print(len(seq_record))
What if we wanted the count the number of sequences?
from Bio import SeqIO
seq_count = 0
for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"):
print(seq_record.id)
seq_count += 1
print("The number of sequences is: " + str(seq_count))
What if I need to get the sequence as a string? (So I can use it with regular expressions, or find an adapter, or do something else with it)
from Bio import SeqIO
seq_count = 0
for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"):
bases = seq_record.seq
len_bases = len(bases)
seq_count += 1
#print seq name and first 5 characters then total length
print(seq_record.id + "\t" + bases[0:5] + "\t" + str(len_bases))
print("The number of sequences is: " + str(seq_count))