Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more docstrings for ElasticsearchDocumentStore and ElasticsearchBM25Retriever #184

Merged
merged 4 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,34 @@

@component
class ElasticsearchBM25Retriever:
"""
ElasticsearchBM25Retriever is a keyword-based retriever that uses BM25 to find the most
similar documents to a user's query.
This retriever is only compatible with ElasticsearchDocumentStore.

Usage example:
```python
from haystack import Document
from elasticsearch_haystack.document_store import ElasticsearchDocumentStore
from elasticsearch_haystack.bm25_retriever import ElasticsearchBM25Retriever

document_store = ElasticsearchDocumentStore(hosts="http://localhost:9200")
retriever = ElasticsearchBM25Retriever(document_store=document_store)

# Add documents to DocumentStore
documents = [
Document(text="My name is Carla and I live in Berlin"),
Document(text="My name is Paul and I live in New York"),
Document(text="My name is Silvano and I live in Matera"),
Document(text="My name is Usagi Tsukino and I live in Tokyo"),
]
document_store.write_documents(documents)

result = retriever.run(query="Who lives in Berlin?")
for doc in result["documents"]:
print(doc.text)
"""
silvanocerza marked this conversation as resolved.
Show resolved Hide resolved

def __init__(
self,
*,
Expand All @@ -20,6 +48,24 @@ def __init__(
top_k: int = 10,
scale_score: bool = False,
):
"""
Initialize ElasticsearchBM25Retriever with an instance ElasticsearchDocumentStore.

:param document_store: An instance of ElasticsearchDocumentStore.
:type document_store: ElasticsearchDocumentStore
:param filters: Filters applied to the retrieved Documents, for more info
see `ElasticsearchDocumentStore.filter_documents`, defaults to None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx should automatically generate the default value, right? (Same question everywhere else).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know it doesn't, and I can't find anything about it either to be fair. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that it has support for it having used it in previous projects, but it might be something that needs to be configured/theme-dependent.

If we currently always specify the default value in the docstring, we'll leave the defaults in this PR. But in that case, we should open an issue to rectify this globally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now we specify it, same thing in the Haystack core repo. I totally agree that we need more standardization on the documentation side.

Tagging @dfokina to keep her in the loop.

:type filters: Optional[Dict[str, Any]], optional
:param fuzziness: Fuzziness parameter passed to Elasticsearch, defaults to "AUTO".
see the official documentation for valid values:
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be following the RST syntax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. ☝️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it this should be valid RST syntax, the link will be highlighted and threated as such. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take your word for it 👍

That being said, we might want to discuss elsewhere if using hyperlinks (as opposed to bare URLs) improves readability. And on a related note, we should also add a CI workflow that automatically deploys documentation changes to a preview service

:type fuzziness: str, optional
:param top_k: Maximum number of Documents to return, defaults to 10
:type top_k: int, optional
:param scale_score: If `True` scales the Document`s scores between 0 and 1, defaults to False
:type scale_score: bool, optional
"""

if not isinstance(document_store, ElasticsearchDocumentStore):
msg = "document_store must be an instance of ElasticsearchDocumentStore"
raise ValueError(msg)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,31 @@


class ElasticsearchDocumentStore:
"""
ElasticsearchDocumentStore is a Document Store for Elasticsearch.
It can be used with Elastic Cloud or your own Elasticsearch cluster.

Simple usage with Elastic Cloud:
```python
from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(cloud_id="YOUR_CLOUD_ID", api_key="YOUR_API_KEY")
```

One can also connect to a self-hosted Elasticsearch instance:
```python
from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(hosts="http://localhost:9200")
```
In the above example we connect with security disabled just to show the basic usage.
We strongly recommend to enable security so that only authorized users can access your data.

For more details on how to connect to Elasticsearch and configure security,
see the official Elasticsearch documentation:
https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html

All extra keyword arguments will be passed to the Elasticsearch client.
"""

def __init__(
self,
*,
Expand All @@ -41,6 +66,11 @@ def __init__(
):
"""
Creates a new ElasticsearchDocumentStore instance.
When no index is explicitly specified, it will use the default index "default".
It will also try to create that index if it doesn't exist yet. Otherwise it will use the existing one.

One can also set the similarity function used to compare Documents embeddings. This is mostly useful
when using the `ElasticsearchDocumentStore` in a Pipeline with an `ElasticsearchEmbeddingRetriever`.

For more information on connection parameters, see the official Elasticsearch documentation:
https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html
Expand Down