Combining phylogenetic networks and Random Forests for prediction of ancestry from multilocus genotype data.
-
Make sure you have the latest version of Python 3.x
python3 --version
-
Install pip3, Java and the tkinter library
sudo apt-get install python3-pip python3-tk default-jre
-
Install Mycorrhiza
pip3 install --upgrade mycorrhiza
-
Install SplitsTree
Follow the instructions in the GUI installer, leaving all settings to default.
wget http://ab.inf.uni-tuebingen.de/data/software/splitstree4/download/splitstree4_unix_4_14_6.sh chmod +x splitstree4_unix_4_14_6.sh ./splitstree4_unix_4_14_6.sh
If the link above is not available - find the most recent version of the SplitsTree: http://ab.inf.uni-tuebingen.de/data/software/splitstree4/download
-
If you don't already have the package manager HomeBrew, install it before proceeding.
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
-
Install Python 3.x
brew install python
-
Install Mycorrhiza
sudo -H pip3 install --upgrade mycorrhiza
-
Install SplitsTree
The package can be found here. Follow the installer instructions, leaving all settings to default.
If the link above is not available - find the most recent version of the SplitsTree: http://ab.inf.uni-tuebingen.de/data/software/splitstree4/download
-
Run an analysis.
Run a 5-fold crossvalidated analysis.
crossvalidate -i gipsy.myc -o out/ -s 5
Run a analysis with a training set and a prediction set. Samples with a learing flag = 1 will be used for training and predictions will be made on samples with a learning flag = 0.
supervised -i gipsy.myc -o out/
To see all available parameters:
crossvalidate -h
-
Import the necessary modules.
from mycorrhiza.dataset import Myco from mycorrhiza.analysis import CrossValidate from mycorrhiza.plotting.plotting import mixture_plot
-
(Optional) By default Mycorrhiza will look for SplitStree in your home folder. I you wish to specify a different path for the SplitsTree executable you can do so in the settings module.
from mycorrhiza.settings import const const['__SPLITSTREE_PATH__'] = '~/splitstree4/SplitsTree'
-
Load some data. Here data is loaded in the Mycorrhiza format from the Gipsy moth sample data file. Example data can be found here.
myco = Myco(file_path='data/gipsy.myc') myco.load()
-
Run an analysis. Here a simple 5-fold cross-validation analysis is executed on all available loci, without partitioning.
cv = CrossValidate(dataset=myco, out_path='data/') cv.run(n_partitions=1, n_loci=0, n_splits=5, n_estimators=60, n_cores=1)
-
Plot the results.
mixture_plot(cv)
https://jgeofil.github.io/mycorrhiza/
For microsatellite loci set the is_str flag to True.
```python
data = Myco(file_path='data/myco.myc', is_str=True)
data = Structure(file_path='data/myco.str', is_str=True)
```
Diploid genotypes occupy 2 rows (the sample identifier must be identical).
Column(s) | Content | Type |
---|---|---|
1 | Sample identifier | string |
2 | Population | string or integer |
3 | Learning flag | {0,1} |
4 to M+3 | SNP Loci | {A, T, G, C, N} |
4 to M+3 | STR Loci | any or 000 |
Diploid genotypes occupy 2 rows (the sample identifier must be identical).
Column(s) | Content | Type |
---|---|---|
1 | Sample identifier | string |
2 | Population | integer |
3 | Learning flag | {0,1} |
4 to O+3 | Optional (Ignored) | |
O+3 to M+O+3 | SNP Loci | integer or -9 |
O+3 to M+O+3 | STR Loci | any or -9 |