-
Notifications
You must be signed in to change notification settings - Fork 0
Home
yeyanbo edited this page Jul 16, 2013
·
11 revisions
Biopython is a set of open source python packages and modules for bioinformatics works. In the Bio.Phylo package, there are already implementations for some basic phylogenetics tasks: basic tree operations, parsers for Newick, Nexus and PhyloXML, and wrapers for Phyml, Raxml and PAML. While there are some important components that remain to be implemented to better support phylogenetic workflows. These include simple tree construction algorithms(UPGMA, NJ and MP), consensus tree searching(Stric, Majority-Rule and Adam Consensus), tree comparison and tree visualization. In this project, the first two will be implemented.
- Implement simple tree inference algorithms of Unweighted Pair Group Method with Arithmetic Mean(UPGMA), Neighbour Joining(NJ) and Maximum Parsimony(MP).
- Implement consensus tree search functions of multiple trees, including Strict consensus tree, majority-rule consensus tree and Adams consensus tree.
- Implement branch support calculation functions given a target tree and a list of bootstrap replicate trees.
- Implement a bootstrap method for a given alignment and provide two interface methods to generate a tree list and construct a consensus tree(given the parameters of treeMethod, consensusMethod and bootstrapTime).
- Tree construction module: UPGMA, NJ and MP algorithms, branch support method
- Consensus tree module: strict, majority-rule and adams consensus tree methods.
- A
DistanceMatrix
class with__getitem__
,__setitem__
,__delitem__
,__len__
andinsert(name, distances)
methods; - A
DistanceCalculator
class to calculate and return aDistanceMatrix
object from a dna or protein 'MSA' object(if time permmited);
- Write a method to calculate the clade height -- the longest path to one of the terminals
- Implement the UPGMA algorithm by porting the Java code from BlastGraph;
- Implement NJ algorithm by porting the Java code;
- Get more clear understanding of the parsimony methods for both DNA and protein sequences.
- Design the parsimony score method and write document and tests for it;
- Implement method to calculate the parsimony score for a given tree and an alignment;
- Work as TA for the bioinformatics course hosted by WHIOV and NEScent.
- Design the parsimony tree searching method and write document and tests for it.
- Implement the Nearest-neighbour interchanges algorithm to search for a tree minimizing the score. A compatible tree manipulation method is needed to interchange the tree branches.
- To be efficient in consensus tree search, design a binary array class with binary like operations to store and count clades, and write document and tests for it.
- Implement the binary array manipulation class using a normal way for each methods at the beginning and improve the performance later(with the same API).
- Cleanup existing code, improve tests and document;
- Write and submit mid-term evaluations.
- Design the strict and majority-rule consensus tree methods and write document and tests for them;
- Implement a method for counting the presence time of each clade given a list of trees. It will be used by both strict consensus and majority-rule consensus methods;
- Implement the strict consensus tree method and majority-rule consensus tree method by porting my Java code into python.
- Design the adams consensus tree method and write document and tests for it;
- Get familiar with the adams consensus tree algorithm and implement it.
- Design the branch support calculation method and write document and tests for it;
- Implement the branch support calculation method given a tree and a list of trees;
- Design the bootstrap and some interface methods, and write document and tests for these methods;
- Implement the bootstrap method;
- Write a interface method to generate a bootstrapped tree list providing the parameter of tree method(UPGMA,NJ,MP) and bootstrap time;
- Write another one for consensus tree given the tree method, consensus method and bootstrap time;