-
Notifications
You must be signed in to change notification settings - Fork 6
Configuration
This tool requires a Conda environment with the following packages:
- Python, tested with v3.7.3 but anything above v3 should be fine
- requests, tested with v2.22.0
- numpy, tested with v1.18.1
- nltk, tested with v3.4.4
- sqlite, tested with v3.30.1
- psutil, tested with 5.6.7
- HMMER, tested with v3.2.1
A conda environment is available - mantis_env.yml
.
The lineage annotation requires quite a lot of space since NOG's HMM database is quite extensive. For the taxonomy you will need around 2.8 terabytes. The rest of the HMMs only take up around 200 gibabytes. To check default datasets see Reference data
You don't need to use all of this data though!
Mantis is easy to setup, simply:
- Clone the repository with git
- Edit MANTIS.config with desired paths
- Create a Conda environment for Mantis with
conda env create -f mantis/mantis_env.yml
- Activate the previously created Conda environment
- Setup all default databases for Mantis with
python mantis setup_databases
To check your installation run:
python mantis check_installation
Keep in mind the installation will take a while as a lot of data is downloaded. If NOG's hmms are not used it can finish within a couple of hours, otherwise it may take a few days.
To customize your installation (setting installation paths or removing certain HMMs) please refer to configuration.
The MANTIS.config allows the user to edit and add custom HMMs. An example config file is included, please use the same syntax, otherwise configuration won't be taken into account.
Mantis comes with a MANTIS.config file which serves as the default to all the users in the system. You can configure your own MANTIS.config file by copying this file and editing it as you wish. Afterwards you can just add -mc <path/to/edited_MANTIS.config>
.
It's preferable to use a self contained environment, avoiding compatibility issues, but you can run Mantis in whichever Conda environment you'd like, simply active it and run Mantis.
This is not necessary, but if you'd like to share your Mantis environment across multiple users do the following:
- Create the Mantis environment in a group folder location, by running
conda env create -f mantis_env.yml -p <path/to/group/folder/>
Future Mantis users now need to tot hef following: - Run
conda config
to generate the.condarc
file - Edit
.condarc
file (usually located in your root folder) and add:
envs_dirs:
- path/to/group/folder/
After running setup_databases
you may wish to move data around, if so, make sure you change all these paths:
uniprot_folder=/path/to/mantis/Resources/Uniprot/
go_obo_folder=/path/to/mantis/Resources/Gene_Ontology/
ncbi_dmp_path_folder=/path/to/mantis/Resources/NCBI/
default_hmms_folder=/path/to/mantis/hmm/
NOGT_hmm_folder=/path/to/mantis/hmm/eggnogdb.embl.de/NOGT/
NOGG_hmm_folder=/path/to/mantis/hmm/eggnogdb.embl.de/NOGG/
pfam_hmm_folder=/path/to/mantis/hmm/pfam/
kofam_hmm_folder=/path/to/mantis/hmm/kofam/
tigrfam_hmm_folder=/path/to/mantis/hmm/tigrfam/
If you don't move any of these folders, don't worry about configuring this.
If you don't want all the hmm files to be used, you can change the path to 'NA', for example: NOGT_hmm_folder=NA
Important: All of the default hmms belong to their respective authors, I haven't compiled any of this data, I'm merely distributing it in a more automated manner! Make sure you cite them when using this tool/their data.
NOGT
is the collection of taxon specific HMMs, NOGG
the collection of all HMMs.
custom_hmms_folder=/path/to/mantis/hmm/custom_hmms/
custom_hmm=/path/to/HMM_folder/file.hmm
Custom hmms can be added in MANTIS.config by adding their absolute path, alternatively you may add them to the custom_hmms folder. This tool will read the folders within the custom hmms folder and use the .hmm stored in each of those folders.
Important:
Remember to use HMMER's hmmpress on the custom hmms!
If custom hmms are divided 1 hmm/hmm file make sure you merge them together using the merge_hmm_folder
function.
If hmms from the same source are not merged, hits processing won't take into account potential hmm hits overlaps.
Most metadata is formatted differently, therefore, for custom hmms this tool requires the metadata to be formatted in a specific manner, otherwise only the hmm name will be extracted as "metadata".
To see an example please go to hmm/custom_hmms/
where you will find two files custom.hmm
and custom.tsv
.
In the custom.tsv
you can see how the metadata should be formatted.
In the first column there should be the HMM name, in the columns that come after any kind of metadata can be added. To specify the type of metadata simply add the type to the headers of the .tsv
file. Columns without any headers will be assumed to be a free-text description. Some identifiers will still be searched for in this free text (EC, KO, TCDB, DUF, GO, and COG).
For the custom metadata to be recognized please place the custom metadata in the same folder as the custom hmm file and use the same name but with a .tsv
extension, for example: path/to/custom_hmm/custom.hmm
and path/to/custom_hmm/custom.tsv
.
The metadata tsv files should have the following format:
HMM_profile | Metadata_type_1 | Metadata_type_2 | Metadata_type_3 |
---|---|---|---|
HMM_1 | 2.1.15.64 | this is a description | |
HMM_1 | 3.2.9.13 | KO0002 | this is a description |
An experienced user can add their linking method to the MANTIS_Interpreter.py
. You can also post an issue in this repository and I can try to write a linking method for your custom hmm.
When generating the consensus, some HMMs can be given more weight, this is important because some HMMs are more specific than others.
By default NOG has the most weight since its HMMs are specific to taxon.
To configure the weight of an HMM simply change the MANTIS.config file:
- example: NOGT_hmm_folder should be
NOGT_weight=X
where X is the weight of the HMM (0-1) - example: custom_hmm=path/to/customHMM.hmm should be
customHMM_weight=X
where X is the weight of the HMM (0-1)
In essence make sure the names of the weights correspond to the path of the HMMs.
If no weight is given to the HMM, it will default to 0.7.