Skip to content

tools for constructing phylogenies from transposable element insertion presence/absence in multi-individual datasets.

Notifications You must be signed in to change notification settings

davidaray/retrophylogenomic_tools

 
 

Repository files navigation

These scripts are part of a workflow for processing presence/absence matrices and phylogenetic trees in order to test for introgression.

1) Mobile element insertion genotypes are generated by the Mobile Element Locator Tool (MELT) https://melt.igs.umaryland.edu/.

2) Generate Input files:
  A) Nexus genotype file:
     See input_genotypes.nex for a simplified example of this input file. The same genotype file will be used for all quartet calculations.
     Matrix block= matrix of TE insertions in 0(absence)/1(presence) format.
         0 indicates absence of the insertion in that species and 1 indicates the insertion is present in the species. 
         Genotypes are derived from from MELT output.
  B) Nexus tree file: 
      See simple.tre for example. The same tree file will be used for all quartet calculations.
      nexus format with a block containing the accepted phylogenetic tree in newick format
  C) Quartet List:
      File listing all possible quartets you wish to test. For my analysis I had 495 possible quartet combinations.
      One quartet per row, species separated by commas & species represented by numbers assigned to them in the Nexus tree file.
      Note: if you have more than 9 individuals in your dataset, be sure to include leading 0's to avoid confusion in later steps. I only needed to do this on "01".
      Example:
        4,2,7,01
        4,2,7,11
        4,2,7,9
        4,2,3,10
  D) Nexus command file template:
      See sample_paup.nex for example
      This file will serve as a template for creating the command files in step E where it will be customized for each quartet.
      You should only need to edit the filenames/paths of your input_genotypes.nex, simple.tree and log files.
  
  E) Nexus command file for each quartet using a series of bash commands:
        # quartets_numbers is the file created in step C, above.
        # create a list of which species need to be removed from the main dataset in order to limit the analysis to just the quartet of interest.
        cat quartets_numbers | while read i; do T1=$(echo ${i} | cut -d, -f 1); T2=$(echo ${i} | cut -d, -f 2); T3=$(echo ${i} | cut -d, -f 3); T4=$(echo ${i} | cut -d, -f 4); DELETE=$(echo "01 02 03 04 05 06 07 08 09 10 11 12 " | sed "s/$T1 //g" | sed "s/$T2 //g" | sed "s/$T3 //g" | sed "s/$T4 //g"); echo "$COUNTER,$DELETE" >> delete_list; COUNTER=$((COUNTER + 1)); done
        # create a directory to store the nexus command files for all quartets
        mkdir nexus
        # create a directory to store output tree files
        mkdir trees
        # create directory to store log files
        mksir logs
        # This customizes the sample_paup.nex file for each quartet.
        # Uses the delete list to update the delete list and number each of the individual nexus command files and deposit in the nexus directory
        cat delete_list | while read i; do NUMBER=$(echo ${i} | cut -d, -f 1); DELETE=$(echo ${i} | cut -d, -f 2); sed "s/bonk/$NUMBER/g" sample_paup.nex > nexus/paup_$NUMBER.nex ; sed -i "s/splat/$DELETE/g" nexus/paup_$NUMBER.nex; done 

3) Nexus files are processed using PAUP (https://paup.phylosolutions.com/)
    paup -n -r paup_quartetNumber.nex
    This command will delete individuals from the dataset and tree that are not in the quartet of interest, exclude any uninformative loci from the set, calculate tree scores and create several files:
      A log file including tree scores for each topology, and two tree files: one showing the accepted tree topology and another showing the three best tree topologies.
      Once all three files have been created for each quartet, you are ready to proceed to the python script.
4) run parse_introgression.py
    This script creates a table by extracting the number of informative characters, tree lengths (also referred to as number of steps and tree scores) for each of the 3 trees and identifying the “accepted” topology consistent with the ASTRAL-MP tree. 
    usage:
    python parse_introgression.py --logfiledir /logs --treefilesdir /trees --outfile parsed_output.out --workpath .
    
    Your output will be a tab separated table showing the log file name, the quartet of interest, the tree bipartition ('Split'), the number of informative characters, the tree length, and a column indicating which of the three topologies is the 'accepted' topology according to the astral phylogeny.  
  Test run         Quartet                    Split           Informative characters          tree length  ASTRAL
test176.log     Bran,Cili,Occu,Yuma     Occu Yuma|Bran Cili     1243                           1965        ASTRAL
test176.log     Bran,Cili,Occu,Yuma     Cili Occu|Bran Yuma     1243                           2119
test176.log     Bran,Cili,Occu,Yuma     Cili Yuma|Occu Bran     1243                           2131

    The values from this table are used to test for introgression following the methods in Springer, M.S.; Molloy, E.K.; Sloan, D.B.; Simmons, M.P.; Gatesy, J. ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets. Journal of Heredity 2020, 111, 147–168, doi:10.1093/jhered/esz076.
    

About

tools for constructing phylogenies from transposable element insertion presence/absence in multi-individual datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%