Skip to content

A simple implementation of semantic search into Elasticsearch using PhoBert model

License

Notifications You must be signed in to change notification settings

thecodeorigin/elastic-semantic-search

Repository files navigation

Vietnamese Semantic search in ElasticSearch using PhoBert

This is a simple implementation of semantic search into Elasticsearch:

  • In this demo, we will use PhoBert, a pre-trained language models made specially for Vietnamese language.

  • Including in this demo is also a simple Flask server for you to have a quick demo on what the results would be.

Prerequisites

If you are planning to run the Flask app locally, you also need

  • Python 3.10 or above (Recommended installing by pyenv)
  • Poetry

Installation

Quickstart

Start every services (Elasticsearch + Kibana + Flask server)

The final image may take up to ~7GB and it can take some time to finish building.

# Without CUDA
sh cmd.run.all.sh

# With CUDA
sh cmd.run.all-cuda.sh

Stop all services and clean up resources

sh cmd.stop.all.sh

Lightweight (Without Flask server)

Start core service (Elasticsearch + Kibana)

sh cmd.run.core.sh

Install required packages

poetry install

Load Hugging model locally

python src/utils/loadmodel.py

Index data to Elasicsearch (Using Python)

This process can take a long time as it is indexing over 100000, you can try reducing the file size manually.

python src/index_es.py

Start Flask server

python3 -m flask --app=app run --host=0.0.0.0

Clean up resources

sh cmd.run.clean.sh

Usage

Access the site at http://127.0.0.1:5000

Contact

Email: [email protected]

Facebook: fb.com/tu.nguyenquang01

Linkedin: linkedin.com/in/quangtudng

About

A simple implementation of semantic search into Elasticsearch using PhoBert model

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published