A pipeline for identification of olfactory receptor(OR) gene family
-
Install
git clone [email protected]:jianzuoyi/orfam.git cd orfam make
-
Run the example script
cd example ./run_orfam
- Python 2.7 (https://www.python.org)
- Biopython
System paths to orfam's component software are specified in the [orfam.config] (bin/orfam.config) file, which should reside in the same directory as the orfam executable (for alternate locations use the -K flag). Upon installation, orfam attempts to automatically generate this file, but manual editing may be necessary.
- Bioawk (https://github.com/lh3/bioawk)
- bedtools (https://github.com/arq5x/bedtools2)
- tblastn (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)
- Exonerate (http://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate)
- MAFFT (http://mafft.cbrc.jp/alignment/software/)
- MEGACC (http://www.megasoftware.net/)
If any components already exist on the system, their paths should be manually specified by editing orfam.config.
orfam is a modular framework with three components:
- orfam align - Search against the target genome with known OR protein sequences as query and produce a alignment result file that can be processed with other orfam modules.
- orfam func - Identification of intact OR genes.
- orfam pseudo - Identification of truncated OR genes and pseudogenes.
orfam align
Search against the target genome with known OR protein sequences as query and produce a alignment result file that can be processed with other orfam modules.
Internally, orfam align
runs the following steps to produce a output file (BLAST format 6):
- Discard the query sequences which length is less than 250
- Alignment with TBLASTN
usage: orfam align [options]
-q FILE olfactory receptor proteins (FASTA)
-s FILE subject genome (FASTA)
-o STR output file [.align]
-T DIR temp directory [./tmpXXXXXXXX]
-e FLOAT evalue for hits
-t INT threads [1]
-K FILE path to orfam.config file (default: same directory as orfam)
-v verbose
-h show this message
orfam align
produces a single output file (BLAST format 6):
outprefix.tblastn
- The alignment result file. This file serve as input for
orfam func
- The alignment result file. This file serve as input for
orfam func
identifies intact OR genes from the target genome.
usage: orfam func [options]
-R FILE reference file (fasta) (required)
-r FILE reference olfactory receptor (fasta) (required)
-B FILE BED file represents the regions of reference olfactory receptor (required)
-A FILE tblastn output (tabular) (required)
-O FILE olfactory receptor for outgroup (fasta) (required)
-S FILE MAO file, setting used to the construction of phylogenetic tree (generated by megaproto) (required)
-o STR output prefix [required]
-t INT threads [1]
-T DIR temp directory [./tmpXXXXXXXX]
-k keep temporary files
-K FILE path to orfam.config file (default: same directory as orfam)
-v verbose
-h show this message
orfam func
produces two output file:
outprefix_best_hit.gff
- This GFF file contains all OR candicate sequences which can be classified into three types: Intact OR genes, Truncated OR genes and OR pseudogenes.
outprefix_intact.fa
- This FASTA file contains all Intact OR gene sequences.
orfam pseudo
identifies truncated OR genes or OR pseudogenes.
usage: orfam pseudo [options]
-s FILE subject genome (fasta) (required)
-q FILE query olfactory receptor proteins (fasta) (required)
-b FILE best hits (gff) (required)
-i FILE intact olfactory receptor (fasta) (required)
-o STR output prefix
-T DIR temp directory [./tmpXXXXXXXX]
-k keep temporary files
-K FILE path to orfam.config file (default: same directory as orfam)
-v verbose
-h show this message"
orfam pseudo
produces five output files:
outprefix_truncated.gff
- This GFF file contains truncated OR genes.
outprefix_pseudo.gff
- This GFF file contains OR pseudogenes.
outprefix_pseudo_nonsense.fa
- This FASTA file contains olfactory receptors with nonsense mutations.
outprefix_pseudo_frameshift.fa
- This FASTA file contains olfactory receptors with frame shift mutations.
outprefix_pseudo_others.fa
- This FASTA file contains olfactory receptors with other mutations.
-
Use
orfam align
to produce a alignment result file.orfam align \ -q data/ORs/ORs.fa \ -s data/mm10/mm10.fa \ -o mm10 \ -e 1e-10 \ -t 20 \ -T temp \ -v \ -k
-
Use
orfam func
to identify intact OR genes.orfam func \ -R data/mm10/mm10.fa \ -r data/ORs/O43749.fasta \ -B data/ORs/O43749.bed \ -O data/ORs/outgroup.fa \ -S bin/infer_NJ_protein.mao \ -A mm10.tblastn \ -o mm10 \ -t 20 \ -T temp \ -k \ -v
-
Use
orfam pseudo
to identify truncated OR genes and OR pseudogenes.orfam pseudo \ -s data/mm10/mm10.fa \ -q intact/mm10_intact.fa \ -b mm10_best_hit.gff \ -i mm10_intact.fa \ -o mm10 \ -T temp \ -k \ -v