Tools to manage file index between Agave and Elasticsearch.
Installation, for now, it's done by cloning this repo and installing is in dev mode using pip.
$ git clone https://github.com/DesignSafe-CI/tacc_indexer.git
$ cd tacc_indexer
$ pip install -e .
There are multiple commands in this tool. Every command is prefixed with tim
(TACC Index Manager)
usage: tim [-h] [-v] [--config CONFIG] [-H [HOSTS [HOSTS ...]]] [-a API_SERVER] [-t TOKEN] [-r REFRESH_TOKEN] [-i INDEX] [-d DOC] root_path path_to_index_root system_id
Indexer to traverse a filesystem directly and create elasticsearch documents to index
positional arguments:
- root_path Path to start traversing for indexing.
- path_to_index_root Path to handle as '/'. This means that it will be removed from the full filepath.
- system_id System Id to use when creating the documents.
optional arguments:
- -h, --help show this help message and exit
- -v, --verbosity increases output verbosity
- --config CONFIG Specify config file. If every argument is listed inthe config file then it is not necessary to specify it in the command line.
- -H [HOSTS [HOSTS ...]], --hosts [HOSTS [HOSTS ...]] One or more hosts to use to connect to ElasticSearch
- -a API_SERVER, --api_server API_SERVER Api Server URL to use with agavepy
- -t TOKEN, --token TOKEN Token to use with agavepy.
- -r REFRESH_TOKEN, --refresh_token REFRESH_TOKEN Agave OAuth refresh token to use with agavepy
- -i INDEX, --index INDEX ES Index name to use. Defaults to "testing".
- -d DOC, --doc DOC ES doc_type name to use. Defaults to "objects".
usage: tim-create [-h] [-v] [--config CONFIG] [-H [HOSTS [HOSTS ...]]] [-Y] from_index index
Creates an empty index copying settings and mappings from another given index
positional arguments:
- from_index Index/Alias from where to copy settings and mappings.
- index Index to create and copy settings and mapping to. Note: It will drop and recreate the index if it already exists.
optional arguments:
- -h, --help show this help message and exit
- -v, --verbosity increases output verbosity
- -H [HOSTS [HOSTS ...]], --hosts [HOSTS [HOSTS ...]] One or more hosts to use to connect to ElasticSearch
- -Y, --yes If set every prompt will automatically be respondedwith Yes
usage: tim-backup [-h] [-v] [--config CONFIG] [-H [HOSTS [HOSTS ...]]] from_index index doc_type [props_to_exclude [props_to_exclude ...]]
Copy all data from one given index to another
positional arguments:
- from_index Index/Alias from where to copy data.
- index Index to copy data into. This should be an existent index if you don't want ES to setup things automatically.
- doc_type Document type to copy.
- props_to_exclude List of the properties to exclude when copying the documents
optional arguments:
- -h, --help show this help message and exit
- -v, --verbosity increases output verbosity
- --config CONFIG Specify config file. If every argument is listed in the config file then it is not necessary to specify it in the command line.
- -H [HOSTS [HOSTS ...]], --hosts [HOSTS [HOSTS ...]] One or more hosts to use to connect to ElasticSearch
The config file data is only used by tim
and tim-backup
. The property indexer
configures the command tim
and the property backuper
configures tim-backup
.
{
"verbosity": false,
"hosts": ["http://designsafe-es01.tacc.utexas.edu:9200/", "http://designsafe-es01.tacc.utexas.edu:9200/"],
"indexer": {
"root_path": "/Users/xirdneh/indexer_test",
"path_to_index_root": "/Users/xirdneh/indexer_test",
"system_id": "designsafe.storage.default",
"api_server": null,
"token": null,
"refresh_token": null,
"index": "tester",
"doc": "objects"
},
"backuper": {
"from_index": "tester",
"index": "designsafe_backup",
"doc_type": "objects",
"props_to_exclue": ["_id" ]
}
}