Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid database connections in document stores constructors #746

Closed
8 tasks done
masci opened this issue May 23, 2024 · 0 comments
Closed
8 tasks done

Avoid database connections in document stores constructors #746

masci opened this issue May 23, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@masci
Copy link
Contributor

masci commented May 23, 2024

Describe the bug
Not a but per-se, but currently the behaviour of some document stores prevents pipeline's validation and serialisation.

The way Haystack pipelines validate and serialise requires all the components to be able to initialise fast, hence any operation loading or downloading data is demanded to the warm_up method of the component classes (when needed).

This should be the case for document stores too, for example:

  • We don't necessarily want to reach the network and connect to a database only in order to create (and validate) a pipeline
  • If we're building a pipeline in order to serialise it and send it over the wire, the database might not be available at "pipeline-build time"

Contrary to regular components, a warm_up strategy wouldn't be effective, because we don't have a single point in the codebase where the pipeline can call document_store.warm_up(). In fact, a document store can be passed to a writer, to a reader, or even used standalone.

For the time being, we decided to roll out a better init strategy for document stores without introducing new abstractions nor breaking changes. This means that every document store, for convention and where it makes sense, will defer the database connection until really needed.

This is the list of document stores affected that we need to change:

Tasks

Preview Give feedback
  1. integration:astra
    masci
  2. integration:elasticsearch
    masci
  3. integration:mongodb-atlas
    masci
  4. integration:opensearch
    masci
  5. integration:pgvector
    masci
  6. Amnah199 masci
  7. integration:qdrant
    masci
  8. integration:weaviate
    masci
@masci masci added the bug Something isn't working label May 23, 2024
@masci masci self-assigned this May 24, 2024
@masci masci closed this as completed Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant