-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Recherche sémantique avec elasticsearch (#988)
* create env and conf for vector elasticsearch * management command to put to elasticsearch * method for vector search * view to test semantic search quickly * use cpu only for cc deployment * embedding with openai api instead HuggingFace * add package tiktoken needed by OpenAIEmbedding * increase number limit * integrate semantic search to other * fix merge conflics in poetry deps * organize settings, elasticsearch api and siae index/meta infos * fix deps order * limit semantic search for admin
- Loading branch information
1 parent
1e55b0c
commit 8d13724
Showing
18 changed files
with
1,125 additions
and
79 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
39 changes: 39 additions & 0 deletions
39
lemarche/siaes/management/commands/put_siaes_in_elasticsearch_index.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
import time | ||
|
||
from django.conf import settings | ||
from django.db.models import TextField | ||
from django.db.models.functions import Length | ||
from langchain.embeddings.openai import OpenAIEmbeddings | ||
from langchain.vectorstores import ElasticVectorSearch | ||
|
||
from lemarche.siaes.models import Siae | ||
from lemarche.utils.apis.api_elasticsearch import URL_WITH_USER | ||
from lemarche.utils.commands import BaseCommand | ||
|
||
|
||
class Command(BaseCommand): | ||
help = "" | ||
|
||
def handle(self, *args, **options): | ||
self.stdout_success("put siae to elasticsearch index started..") | ||
|
||
# Elasticsearch as a vector db | ||
embeddings = OpenAIEmbeddings() | ||
db = ElasticVectorSearch( | ||
embedding=embeddings, elasticsearch_url=URL_WITH_USER, index_name=settings.ELASTICSEARCH_INDEX_SIAES | ||
) | ||
|
||
# Siaes with completed description | ||
TextField.register_lookup(Length) # at least 10 characters | ||
siaes = Siae.objects.filter(description__length__gt=9).all() | ||
|
||
for siae in siaes: | ||
db.from_texts( | ||
[siae.elasticsearch_index_text], | ||
metadatas=[siae.elasticsearch_index_metadata], | ||
embedding=embeddings, | ||
elasticsearch_url=URL_WITH_USER, | ||
index_name=settings.ELASTICSEARCH_INDEX_SIAES, | ||
) | ||
time.sleep(1) | ||
self.stdout_success(f"{siae.name} added !") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
<div class="siae-info-sticky c-box bg-light mb-3"> | ||
<div class="si-ideas"> | ||
<h3 class="h4">Idées reçues</h3> | ||
<p> | ||
<span> | ||
<i class="ri-check-fill ri-xl font-weight-bold"></i> | ||
</span> | ||
<span class="ml-2"> | ||
Le prestataire est trop petit pour répondre à mon besoin… | ||
<b>Mais il est sûrement ouvert à la co-traitance.</b> | ||
</span> | ||
</p> | ||
<p> | ||
<span> | ||
<i class="ri-check-fill ri-xl font-weight-bold"></i> | ||
</span> | ||
<span class="ml-2"> | ||
Son chiffre d'affaires est trop bas et je ne veux pas être | ||
son seul client… <b>Mais Vous pouvez commencer par lui confier | ||
un marché de plus faible périmètre, sans prendre de risque, | ||
puis faire grandir ce partenariat si vous en êtes satisfait.</b> | ||
</span> | ||
</p> | ||
<p> | ||
<span> | ||
<i class="ri-check-fill ri-xl font-weight-bold"></i> | ||
</span> | ||
<span class="ml-2"> | ||
L'offre ne correspond pas exactement à ce que je cherche… | ||
<b>Heureusement les entreprises sociales inclusives sont très | ||
innovantes et s'adaptent à vos besoins.</b> | ||
</span> | ||
</p> | ||
</div> | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
{% load static bootstrap4 wagtailcore_tags advert_cms %} | ||
{% block content %} | ||
<div class="col-12 col-lg-8"> | ||
<div class="c-box mb-3"> | ||
{% if siaes %} | ||
{% for siae in siaes %} | ||
{% include "siaes/_card_search_result.html" with siae=siae %} | ||
<!-- insert to nudge tender creation --> | ||
{% if forloop.counter in position_promote_tenders and page_obj.number == 1 %} | ||
{% include "siaes/_card_suggest_tender.html" with current_perimeters=current_perimeters current_sectors=current_sectors %} | ||
{% endif %} | ||
{% endfor %} | ||
{% else %} | ||
<!-- no results --> | ||
<p>Il y a encore de l'espoir ❤️</p> | ||
<p>Publiez votre besoin, et on s'occupe de vous trouver des prestataires inclusifs.</p> | ||
<p>Obtenez des réponses en moins de 24 heures (en moyenne).</p> | ||
<a href="{% url 'tenders:create' %}" | ||
id="siae-search-empty-demande" | ||
class="btn btn-primary d-block d-md-inline-block mb-2"> | ||
<i class="ri-mail-send-line ri-lg mr-2"></i>Publier un besoin d'achat | ||
</a> | ||
{% endif %} | ||
</div> | ||
</div> | ||
<!-- sidebar --> | ||
<div class="col-12 col-lg-4 siae-info mt-6 mt-sm-0"> | ||
{% cms_advert layout="card" %} | ||
{% include "siaes/_si_ideas_search_result.html" %} | ||
</div> | ||
{% endblock %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
from django.conf import settings | ||
from langchain.embeddings.openai import OpenAIEmbeddings | ||
from langchain.vectorstores.elasticsearch import ElasticsearchStore | ||
|
||
|
||
BASE_URL = f"{settings.ELASTICSEARCH_HOST}:{settings.ELASTICSEARCH_PORT}" | ||
URL = f"{settings.ELASTICSEARCH_SCHEME}://{BASE_URL}" | ||
URL_WITH_USER = ( | ||
f"{settings.ELASTICSEARCH_SCHEME}://{settings.ELASTICSEARCH_USERNAME}:{settings.ELASTICSEARCH_PASSWORD}@{BASE_URL}" | ||
) | ||
|
||
|
||
def siaes_similarity_search(search_text): | ||
"""Performs semantic search with Elasticsearch as a vector db | ||
Args: | ||
search_text (str): User search query | ||
Returns: | ||
list: list of siaes id that match the search query | ||
""" | ||
db = ElasticsearchStore( | ||
embedding=OpenAIEmbeddings(), | ||
es_user=settings.ELASTICSEARCH_USERNAME, | ||
es_password=settings.ELASTICSEARCH_PASSWORD, | ||
es_url=URL, | ||
index_name=settings.ELASTICSEARCH_INDEX_SIAES, | ||
) | ||
|
||
similar_docs = db.similarity_search(search_text, k=10) | ||
siaes_id = [] | ||
for similar_doc in similar_docs: | ||
siaes_id.append(similar_doc.metadata["id"]) | ||
|
||
return siaes_id |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.