Skip to content
This repository has been archived by the owner on Feb 28, 2024. It is now read-only.

Latest commit

 

History

History
98 lines (59 loc) · 3.28 KB

README.md

File metadata and controls

98 lines (59 loc) · 3.28 KB

Akshara Elasticsearch

We use elasticsearch to provide full-text search capabilities for the documents that are in our system.

This folder houses all the custom elasticsearch setup/configuration we do.

Features

  • Basic search for language texts in their native scripts
  • Filtering for language-specific stop words during search
  • Search text in native scripts using their latin-transliterated equivalents

Supported languages and scripts:

Currently, the configuration is optimized for single node cluster, and has sane defaults for the indices.

Usage

Setup

To provision a cluster with the core elasticsearch setup here, run docker-compose up from the project root.

After the cluster is up, run the setup script:

elasticsearch/setup_akshara_cluster.sh

This ensures that all the features listed above are available in the cluster.

Indexing

Things to follow when indexing documents into the cluster:

  • Use indices of pattern akshara_<language>*. Eg: for Nepali docs, use index name that matches the pattern akshara_nepali*.
  • Set the type of the documents to _doc.
  • Set the pipeline param to akshara_pipeline (processes docs during ingestion for some useful enrichment).
  • Document should have fields defined in the index template.

For an actual usage example, see the script test/index_akshara.sh. The script populates the cluster with some sample documents.

test/index_akshara.sh test/sample_docs/*.json

Querying

  • Query for documents using a language specific index pattern. Eg: akshara_nepali* for Nepali docs.
  • If you want to query over all indices, use akshara*.

For some example queries, see the script test/sample_queries.sh.

Monitoring

To monitor cluster status/performance, you can use Kibana's monitoring UI (note: this is not available when using the kibana-oss image).

Also look into elasticsearch's cat api.

Useful Commands

These are meant to be run from the project root.

# start only the elasticsearch service
docker-compose up elasticsearch

# force a build for our custom image. useful if we modify the image in any way
# docker compose does not rebuild the image if it already exists
docker-compose up --build elasticsearch

# remove the elasticsearch data volume (reset all the indices)
docker-compose down --volumes

Others:

# inspect the elasticsearch container
docker exec -it akshara_elasticsearch bash

# delete the test index
curl -XDELETE $HOSTNAME:9200/akshara_nepali_test

TODO

  • Add all of the decided fields to the Nepali index mapping

  • Improved stemming for Nepali Devanagari search (currently based on the Hindi stemmer)

  • Support fuzzy transliterated searches for Nepali

  • Implement size based sharding for akshara indices

  • Improve the container logging setup