From 374bab58ea0f5857606b9f1f90f586e0c2b7c7ab Mon Sep 17 00:00:00 2001 From: MartinMikita Date: Tue, 27 Sep 2016 18:12:26 +0200 Subject: [PATCH] Updated README - added section for index storage space --- README.md | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 31566a9..912a747 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,8 @@ This endpoint returns 20 results matching the `` within a specific countr # Input data.tsv format -This service accepts only TSV file named `data.tsv` with the specific order of columns: +This service accepts only TSV file named `data.tsv` (or gzip-ed version named `data.tsv.gz`) + with the specific order of columns: ``` name @@ -47,6 +48,7 @@ wikidata wikipedia ``` +The source data should be **sorted by the column `importance`**. For the description of these columns, see [data format in geometalab/OSMNames repository](https://github.com/geometalab/OSMNames#data-format-of-tsv-export-of-osmnames). @@ -56,24 +58,24 @@ This docker image consists from internaly connected and setup OSMNames Websearch Whole service can be run from command-line with one command: -Run with demo data (10 lines) only +Run with demo data (sample of 100k lines from [geometalab/OSMNames](https://github.com/OSMNames/OSMNames/releases/tag/v1.1)) only ``` docker run -d --name klokantech-osmnames-sphinxsearch -p 80:80 klokantech/osmnames-sphinxsearch ``` -You can attach your file `data.tsv`, which has to be located in the internal path `/data/input/data.tsv`: +You can attach your file `data.tsv` (or `data.tsv.gz`), which has to be located in the internal path `/data/input/data.tsv` (or `/data/input/data.tsv.gz`): ``` docker run -d --name klokantech-osmnames-sphinxsearch \ - -v /path/to/folder/data.tsv:/data/input/ \ + -v /path/to/folder/:/data/input/ \ -p 80:80 \ klokantech/osmnames-sphinxsearch ``` This file will be indexed on the first run or if index files are missing. -You can specify path for index folder as well: +You can specify a path for index folder as well: ``` docker run -d --name klokantech-osmnames-sphinxsearch \ @@ -88,7 +90,7 @@ You can attach your path with the following folder structure: ``` /path/to/folder/ - input/ - - data.tsv + - data.tsv (or data.tsv.gz) - index/ ``` @@ -97,3 +99,15 @@ directly with simple command: ``` docker run -d -v /path/to/folder/:/data/ -p 80:80 klokantech/osmnames-sphinxsearch ``` + +# Index storage space + +The service for full-text search SphinxSearch requires indexing a source data. +The index operation is required only for the first time or if source data has been changed. +This operation takes longer on a source with more lines, and requires more space storage as well. + +A demo sample data with 100 000 lines has **9.3 MiB gzip**-ed source data file and requires storage space of **133.8 MiB for the index** folder. The operation tooks in an average 15 seconds. + +The [full planet source data](https://github.com/OSMNames/OSMNames/releases/download/v1.1/planet-latest.tsv.gz) with 21 million lines requires storage space of **27.4 GiB for the index** folder. The operation tooks in an average 48 minutes. + +The index operation is done automatically if certain the index file is missing via the prepared script `sphinx-reindex.sh`. You can use this script to force index operation as well: `$ time bash sphinx-reindex.sh force`.