forked from jkorstia/retrophylogenomic_tools
-
Notifications
You must be signed in to change notification settings - Fork 0
davidaray/retrophylogenomic_tools
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
These scripts are part of a workflow for processing presence/absence matrices and phylogenetic trees in order to test for introgression. 1) Mobile element insertion genotypes are generated by the Mobile Element Locator Tool (MELT) https://melt.igs.umaryland.edu/. 2) Generate Input files: A) Nexus genotype file: See input_genotypes.nex for a simplified example of this input file. The same genotype file will be used for all quartet calculations. Matrix block= matrix of TE insertions in 0(absence)/1(presence) format. 0 indicates absence of the insertion in that species and 1 indicates the insertion is present in the species. Genotypes are derived from from MELT output. B) Nexus tree file: See simple.tre for example. The same tree file will be used for all quartet calculations. nexus format with a block containing the accepted phylogenetic tree in newick format C) Quartet List: File listing all possible quartets you wish to test. For my analysis I had 495 possible quartet combinations. One quartet per row, species separated by commas & species represented by numbers assigned to them in the Nexus tree file. Note: if you have more than 9 individuals in your dataset, be sure to include leading 0's to avoid confusion in later steps. I only needed to do this on "01". Example: 4,2,7,01 4,2,7,11 4,2,7,9 4,2,3,10 D) Nexus command file template: See sample_paup.nex for example This file will serve as a template for creating the command files in step E where it will be customized for each quartet. You should only need to edit the filenames/paths of your input_genotypes.nex, simple.tree and log files. E) Nexus command file for each quartet using a series of bash commands: # quartets_numbers is the file created in step C, above. # create a list of which species need to be removed from the main dataset in order to limit the analysis to just the quartet of interest. cat quartets_numbers | while read i; do T1=$(echo ${i} | cut -d, -f 1); T2=$(echo ${i} | cut -d, -f 2); T3=$(echo ${i} | cut -d, -f 3); T4=$(echo ${i} | cut -d, -f 4); DELETE=$(echo "01 02 03 04 05 06 07 08 09 10 11 12 " | sed "s/$T1 //g" | sed "s/$T2 //g" | sed "s/$T3 //g" | sed "s/$T4 //g"); echo "$COUNTER,$DELETE" >> delete_list; COUNTER=$((COUNTER + 1)); done # create a directory to store the nexus command files for all quartets mkdir nexus # create a directory to store output tree files mkdir trees # create directory to store log files mksir logs # This customizes the sample_paup.nex file for each quartet. # Uses the delete list to update the delete list and number each of the individual nexus command files and deposit in the nexus directory cat delete_list | while read i; do NUMBER=$(echo ${i} | cut -d, -f 1); DELETE=$(echo ${i} | cut -d, -f 2); sed "s/bonk/$NUMBER/g" sample_paup.nex > nexus/paup_$NUMBER.nex ; sed -i "s/splat/$DELETE/g" nexus/paup_$NUMBER.nex; done 3) Nexus files are processed using PAUP (https://paup.phylosolutions.com/) paup -n -r paup_quartetNumber.nex This command will delete individuals from the dataset and tree that are not in the quartet of interest, exclude any uninformative loci from the set, calculate tree scores and create several files: A log file including tree scores for each topology, and two tree files: one showing the accepted tree topology and another showing the three best tree topologies. Once all three files have been created for each quartet, you are ready to proceed to the python script. 4) run parse_introgression.py This script creates a table by extracting the number of informative characters, tree lengths (also referred to as number of steps and tree scores) for each of the 3 trees and identifying the “accepted” topology consistent with the ASTRAL-MP tree. usage: python parse_introgression.py --logfiledir /logs --treefilesdir /trees --outfile parsed_output.out --workpath . Your output will be a tab separated table showing the log file name, the quartet of interest, the tree bipartition ('Split'), the number of informative characters, the tree length, and a column indicating which of the three topologies is the 'accepted' topology according to the astral phylogeny. Test run Quartet Split Informative characters tree length ASTRAL test176.log Bran,Cili,Occu,Yuma Occu Yuma|Bran Cili 1243 1965 ASTRAL test176.log Bran,Cili,Occu,Yuma Cili Occu|Bran Yuma 1243 2119 test176.log Bran,Cili,Occu,Yuma Cili Yuma|Occu Bran 1243 2131 The values from this table are used to test for introgression following the methods in Springer, M.S.; Molloy, E.K.; Sloan, D.B.; Simmons, M.P.; Gatesy, J. ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets. Journal of Heredity 2020, 111, 147–168, doi:10.1093/jhered/esz076.
About
tools for constructing phylogenies from transposable element insertion presence/absence in multi-individual datasets.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 100.0%