Bioinformatics is the field that develops methods and software tools for understanding biological data. Units 6 and 7 in this course will help with understanding the basics of biology for this field.
Next-generation sequencing (NGS) is one of the fundamental technological developments. Whole-genome sequencing (WGS), restriction site-associated DNA sequencing (RAD-Seq), ribonucleic acid sequencing (RNA-Seq), chromatin immunoprecipitation sequencing (ChIP-Seq), and several other technologies are routinely used to investigate important biological problems. These are called high-throughput (HT) sequencing technologies. See this for a python package to help with the HT sequencing.
DNA in text files is represented as a string with sequence of specific characters; so, knowing about the following topics will be helpful:
Examples of Functions Bioinformatics:
-
Counting bases in a DNA sequence (Tetranucleotide Frequency):
-
Reverse Complement of DNA:
-
Computing GC Content: A higher GC content level indicates a relatively higher melting temperature in molecular biology, and DNA sequences that encode proteins tend to be found in GC-rich regions.
-
Transcribing DNA into mRNA: regions of DNA must be transcribed into a form of RNA called messenger RNA (mRNA).
-
Translating mRNA into Protein: mRNA makes protein.
Note on points 4,5: these functions can be done using string replacement and regex but using BioPython is the recommended approach.
-
Finding Open Reading Frames ORF: finding a region in DNA or RNA. using regex: This region starts with M and ends with (*).
the following section is applied after a series of transcribing and translating steps
Sequence file extensions:
- To read or write to a file:
For compressed a fastq files: