-
Notifications
You must be signed in to change notification settings - Fork 6
Configuration
conda install -c bioconda mantis_pfa
mantis setup
Mantis comes with a MANTIS.cfg file which serves as the default configuration to all the users in the system. You can configure your own MANTIS.cfg file by copying this file and editing it as you wish. Afterwards you can just add -mc <path/to/edited_MANTIS.cfg>
.
You might want to store your reference databases in another path besides the default References
folder. If you want to keep all the reference databases in one specific folder change:
default_ref_folder=/path/to/mantis/ref/
If you want to have each reference in specific folders, then change:
nog_ref_folder=/path/to/nog/
ncbi_ref_folder=/path/to/ncbi/
pfam_ref_folder=/path/to/pfam/
kofam_ref_folder=/path/to/kofam/
tcdb_ref_folder=/path/to/tcdb/
If you don't want all the reference files to be used, you can change the path to 'NA', for example: nog_ref_folder=NA
Important: All of the default references belong to their respective authors, I haven't compiled any of this data, I'm merely distributing it in a more automated manner! Make sure you cite them when using this tool/their data.
Custom references can be added in MANTIS.cfg by adding their absolute path or folder path, for example:
custom_ref=path/to/ref/custom1.hmm
custom_ref=path/to/ref/custom2.dmnd
Alternatively you may add them to the custom_refs folder, for example:
Mantis/References/Custom_references/custom1/custom1.hmm
Mantis/References/Custom_references/custom2/custom2.dmnd
You may also redifine the custom_refs folder path by adding your preferred path to custom_refs_folder
in the MANTIS.cfg file, for example:
custom_refs_folder=path/to/custom_refs/
I have also compiled other reference datasets you may use with Mantis. These might not be suitable for all use-cases and so they are not included as default references. To access them please go to this url and follow the instructions on how to generate them. These are formatted to be directly plugged in with Mantis. You may of course create your own, please feel free to use the provided code to create your own references.
If custom HMMs are divided 1 hmm/hmm file make sure you merge them together (if they are from the same database). If HMMs from the same source are not merged, hits processing won't take into account potential hmm hits overlaps.
When using a list of sequences as a reference please use Diamond to generate a .dmnd
file.
Most metadata is formatted differently, therefore, for custom references it is required that metadata is formatted in a specific manner, otherwise only the hmm/sequence name is extracted as "metadata".
To see an example please go to References/custom_refs/
where you will find two files custom.hmm
and metadata.tsv
.
In the metadata.tsv
you can see how the metadata should be formatted.
In the first column there should be the HMM/sequence name, in the columns that come after any kind of metadata can be added. To specify the type of metadata simply add the type before the actual annotation, .e.g., enzyme_ec:1.1.1.1
.
For the custom metadata to be recognized please place the custom metadata in the same folder as the custom reference file and name it metadata.tsv
, for example: path/to/custom_ref/custom123.hmm
and path/to/custom_ref/metadata.tsv
.
The metadata tsv files should have the following format:
REF_ID_1 | enzyme_ec:2.1.15.64 | description:this is a description | |
REF_ID_2 | enzyme_ec:3.2.9.13 | kegg_ko:KO0002 | description:this is a description |
Make sure all the reference IDs (e.g., REF_ID_1) are unique!!!
Currently Mantis uses all these ID types:
- kegg_map_lineage
- kegg_ko
- description
- kegg_cazy
- eggnog
- go
- cog
- pfam
- tcdb
- enzyme_ec
Please make sure you use the same format when adding your custom metadata tsv. Other links are supported but may not be properly recognized during consensus generation.
When generating the consensus, some references can be given more weight, this is important because some are more specific than others. To configure the weight of a reference simply change the MANTIS.cfg file:
- example: nog_ref_folder should be
nog_weight=X
where X is the weight of the HMM (0-1) - example: custom_ref=path/to/customHMM.hmm should be
customref_weight=X
where X is the weight of the HMM (0-1)
To reiterate, to set the weight of a custom reference, simply add a line with the hmm file name followed by _weight
, for example, if the file path is path/to/hmm/custom1.hmm
, then you take the name of the file custom1
without the extension and set the weight like so: custom1_weight=0.5
In essence make sure the names of the weights correspond to the name of the references.
Default weight is 0.7.
Reference data can be updated by simply deleting the old reference data folders (e.g., KOfam) and running setup
, Mantis will then download the most recent data from the respective source.
You still want to use the whole eggNOG compendium of HMMs (instead of the eggNOG diamond database), but only want to have HMMs for some taxa? You have the option of selecting only specific taxa by inserting a list of IDs or organism names in the MANTIS.cfg
file line nog_tax=
. If an organism name is introduced, an automatic web search retrieves the respective NCBI ID. A lineage for each NCBI ID is then generated and all the required TSHMMs are downloaded. The line nog_tax
is commented by default.
Please keep in mind that this will also restrict the general eggNOG HMM. When downloading the full eggNOG compendium, the general eggNOG HMM will contain all non-redundant HMMs from 2157 (Archaea), 2 (Bacteria), 2759 (Eukaryota), 10239 (Viruses), 28384 (Others), and 12908 (Unclassified). However, when restricting the taxon with nog_tax
, the general HMM will only contain the top-level HMMs from the selected taxa. For example, if using nog_tax=562
, the general eggNOG HMM will only contain the HMMs from taxon 2
since the taxonomic lineage of the NCBI taxon 562
corresponds to 2 - 1224 - 1236 - 91347 - 543 - 561 - 562
.
This will not affect the eggNOG diamond database or NPFM TSHMMs setup.
It's preferable to use a self-contained environment, avoiding compatibility issues. If you'd like to share your Mantis environment across multiple users do the following:
- Create the Mantis environment in a group folder location, by running
conda env create -n mantis_env -p <path/to/group/folder/>
Future Mantis users now need to do the following: - Run
conda config
to generate the.condarc
file - Edit
.condarc
file (usually located in your root folder) and add:
envs_dirs:
- path/to/group/folder/
If using conda to run Mantis, these are the main packages Mantis requires:
- Python, tested with v3.7.3 but anything above v3 should be fine
- requests, tested with v2.22.0
- numpy, tested with v1.18.1
- nltk, tested with v3.4.4
- psutil, tested with 5.6.7
- HMMER, tested with v3.2.1
- GCC, for compilation of cython code (most systems should have it by default)
These are all installed when you run conda install -c bioconda mantis_pfa
. Regardless, for reproducibility, a conda environment recipe is also available - mantis_env.yml
.
Mantis can only run on Linux or MacOS systems. For MacOS make sure you use python 3.7
Space requirements depend on the eggNOG database used. A diamond database (similar to the eggNOG-mapper diamond database) has been recently added which reduces space requirements by a lot (from 1.5T to 130G). This database is also lineage specific.
If you would like to use the legacy eggNOG HMM database just set the config line nog_ref
to hmm
instead of dmnd
in the MANTIS.cfg file:
nog_ref=hmm # dmnd or hmm
The lineage annotation with eggNOG HMMs requires a lot of space since eggNOG's HMM database is quite extensive. For the taxonomy you will need around 1.5 terabytes. The rest of the HMMs only take up around 27 gibabytes. To check default datasets see Reference data
You don't need to use all of this data though!
Mantis is easy to setup, simply run:
conda install -c bioconda mantis_pfa
mantis setup
To check your installation run:
mantis check
Keep in mind the installation will take a while as a lot of data is downloaded. If NOG's HMMs are not used it can finish within a couple of hours (by default a NOG diamond database is generated), otherwise it may take a few days.
To customize your installation (setting installation paths or removing certain HMMs) please refer to configuration.