Skip to content
yeyanbo edited this page Jul 16, 2013 · 11 revisions

Abstract

Biopython is a set of open source python packages and modules for bioinformatics works. In the Bio.Phylo package, there are already implementations for some basic phylogenetics tasks: basic tree operations, parsers for Newick, Nexus and PhyloXML, and wrapers for Phyml, Raxml and PAML. While there are some important components that remain to be implemented to better support phylogenetic workflows. These include simple tree construction algorithms(UPGMA, NJ and MP), consensus tree searching(Stric, Majority-Rule and Adam Consensus), tree comparison and tree visualization. In this project, the first two will be implemented.

Approach & Goals

  • Implement simple tree inference algorithms of Unweighted Pair Group Method with Arithmetic Mean(UPGMA), Neighbour Joining(NJ) and Maximum Parsimony(MP).
  • Implement consensus tree search functions of multiple trees, including Strict consensus tree, majority-rule consensus tree and Adams consensus tree.
  • Implement branch support calculation functions given a target tree and a list of bootstrap replicate trees.
  • Implement a bootstrap method for a given alignment and provide two interface methods to generate a tree list and construct a consensus tree(given the parameters of treeMethod, consensusMethod and bootstrapTime).

Key Deliverables

  • Tree construction module: UPGMA, NJ and MP algorithms, branch support method
  • Consensus tree module: strict, majority-rule and adams consensus tree methods.

Project Timeline

Week1(6.17-6.23): Distance Matrix and Calculation

  • A DistanceMatrix class with __getitem__, __setitem__, __delitem__, __len__ and insert(name, distances) methods;
  • A DistanceCalculator class to calculate and return a DistanceMatrix object from a dna or protein 'MSA' object(if time permmited);

Week2(6.24-6.30): UPGMA & NJ

  • Write a method to calculate the clade height -- the longest path to one of the terminals
  • Implement the UPGMA algorithm by porting the Java code from BlastGraph;
  • Implement NJ algorithm by porting the Java code;

Week3(7.1-7.7): Parsimony Score Method for MP

  • Get more clear understanding of the parsimony methods for both DNA and protein sequences.
  • Design the parsimony score method and write document and tests for it;
  • Implement method to calculate the parsimony score for a given tree and an alignment;

Week4(7.8-7.14): Bioinformatics Course

Week5(7.15-7.21): Tree Searching Method for MP

  • Design the parsimony tree searching method and write document and tests for it.
  • Implement the Nearest-neighbour interchanges algorithm to search for a tree minimizing the score. A compatible tree manipulation method is needed to interchange the tree branches.

Week6(7.22-7.28): Binary Array Class for Consensus Tree Search

  • To be efficient in consensus tree search, design a binary array class with binary like operations to store and count clades, and write document and tests for it.
  • Implement the binary array manipulation class using a normal way for each methods at the beginning and improve the performance later(with the same API).

Week7(7.29-8.4): Cleanup and Mid-term Evaluations

  • Cleanup existing code, improve tests and document;
  • Write and submit mid-term evaluations.

Week8(8.5-8.11): Strict and Majority-rule Consensus Tree

  • Design the strict and majority-rule consensus tree methods and write document and tests for them;
  • Implement a method for counting the presence time of each clade given a list of trees. It will be used by both strict consensus and majority-rule consensus methods;
  • Implement the strict consensus tree method and majority-rule consensus tree method by porting my Java code into python.

Week9(8.12-8.18): Adams Consensus Tree

  • Design the adams consensus tree method and write document and tests for it;
  • Get familiar with the adams consensus tree algorithm and implement it.

Week10(8.19-8.25): Branch Support

  • Design the branch support calculation method and write document and tests for it;
  • Implement the branch support calculation method given a tree and a list of trees;

Week11(8.26-9.1): Bootstrap Method

  • Design the bootstrap and some interface methods, and write document and tests for these methods;
  • Implement the bootstrap method;
  • Write a interface method to generate a bootstrapped tree list providing the parameter of tree method(UPGMA,NJ,MP) and bootstrap time;
  • Write another one for consensus tree given the tree method, consensus method and bootstrap time;

Week12(9.2-9.8): Cleanup existing code, improve tests and documentation

Week13(9.9-9.15): Write tutorials, add code example to the phylo cookbook for biopython

Week14(9.16-9.22): Suggested 'pencils down' date. Take a week to scrub code, improve documentation, etc.

Week15(9.23-9.29): Firm 'pencils down' date. Write and submit final evaluations to Google. Submit required code samples to Google.