Skip to content

Code to simplify import of data into Ensembl databases

License

Notifications You must be signed in to change notification settings

genomehubs/easy-import

 
 

Repository files navigation

Easy Import

Code to make it easy to import heterogeneous data into an EnsEMBL database.

This code is part of the GenomeHubs project and the latest documentation is available at gitbook.io.

The following instruction are included for users wishing to use this repository outside of the GenomeHubs framework and have not been tested with this version*__

The instructions below will help you get an Ensembl database and website up and running in an afternoon - with four Lepidopteran genomes mirrored from Ensembl Metazoa plus a fresh import of the genome of the winter moth Operophtera brumata direct from publicly hosted .gff and .fasta files.

This is a sister project to easy-mirror (included as a submodule), which makes it possible to set up a mirror of any Ensembl or Ensembl Genomes (including Bacteria, Metazoa, Fungi, Plants and Protists) species in four simple steps that can be run in less than an hour on a fresh Ubuntu installation.

The latest and most complete documentation for both projects is available at easy-import.readme.io

Stage 1. Server setup

Step 1.1: Install dependencies

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install git
cd ~
git clone --recursive https://github.com/lepbase/easy-import ei
cd ~/ei/em
sudo ./install-dependencies.sh ../conf/setup.ini

Step 1.2: Setup database connections

cd ~/ei/em
./setup-databases.sh ../conf/setup-db.ini

Step 1.3: git clone any essential ensembl repositories

cd ~/ei/em
./update-ensembl-code.sh ../conf/setup.ini

Stage 2. Core import

Using core-import.ini will install a new core database for the winter moth Operophtera brumata

Step 2.1: (optional) Fetch/summarise assembly/annotation files

mkdir ~/import
cd ~/import
perl ../ei/core/summarise_files.pl ../ei/conf/core-import.ini

Step 2.2: create database and load sequence data

cd ~/import
perl ../ei/core/import_sequences.pl ../ei/conf/core-import.ini
perl ../ei/core/import_sequence_synonyms.pl ../ei/conf/core-import.ini

Step 2.3: Prepare the gff file for import

cd ~/import
perl ../ei/core/prepare_gff.pl ../ei/conf/core-import.ini

Step 2.4: import gff from the prepared file

cd ~/import
perl ../ei/core/import_gene_models.pl ../ei/conf/core-import.ini

Optional: Step 2.5: import additional annotations

cd ~/import
perl ../ei/core/import_blastp.pl ../ei/conf/example.ini ../ei/conf/core-import-extra.ini
perl ../ei/core/import_repeatmasker.pl ../ei/conf/example.ini ../ei/conf/core-import-extra.ini
perl ../ei/core/import_interproscan.pl ../ei/conf/example.ini ../ei/conf/core-import-extra.ini
perl ../ei/core/import_cegma_busco.pl ../ei/conf/example.ini ../ei/conf/core-import-extra.ini

Optional: Step 2.6: export files

cd ~/import
perl ../ei/core/export_sequences.pl ../ei/conf/core-import.ini
perl ../ei/core/export_json.pl ../ei/conf/core-import.ini

Optional: Step 2.7: verify import

cd ~/import
perl ../ei/core/verify_translations.pl ../ei/conf/core-import.ini

Optional: Step 2.8: generate search index

cd ~/import
perl ../ei/core/index_database.pl ../ei/conf/core-import.ini

Stage 3. Web site configuration

Step 3.1: Update Ensembl webcode

edit setup.ini to add operophtera_brumata_v1_core_31_84_1 to [DATA_SOURCE] SPECIES_DBS

cd ~/ei/em
./update-ensembl-code.sh ../conf/setup.ini

Step 3.2: Reload Ensembl website

cd ~/ei/em
./reload-ensembl-site.sh ../conf/setup.ini

About

Code to simplify import of data into Ensembl databases

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Perl 99.3%
  • TSQL 0.7%