Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebRetriver bug in Google API #6003

Closed
HGamalElDin opened this issue Oct 8, 2023 · 5 comments
Closed

WebRetriver bug in Google API #6003

HGamalElDin opened this issue Oct 8, 2023 · 5 comments
Labels
1.x P3 Low priority, leave it in the backlog wontfix This will not be worked on

Comments

@HGamalElDin
Copy link

Hello guys! I'm trying to run a WebRetriever RAG pipeline, but the webretriever with google doesn't work anyway!

Here's a code snippet I'm trying,

pn = PromptNode(
    "gpt-3.5-turbo",
    api_key="my_Key",
    max_length=256,
    default_prompt_template="question-answering-with-document-scores",
)
web_retriever = WebRetriever(search_engine_provider="GoogleAPI", api_key="my_key", top_search_results=5, top_k=3, search_engine_kwargs={"engine_id": "engineID"})
pipeline = WebQAPipeline(retriever=web_retriever, prompt_node=pn)

I got the error: TypeError: WebRetriever.init() got an unexpected keyword argument 'search_engine_kwargs'

When I remove this argument and try to run the pipeline, I get Exception: Exception while running node 'Retriever': You need to provide an engine ID for the Google API. See https://developers.google.com/custom-search/v1/overview
Enable debug logging to see the data that was passed when the pipeline failed.

I tried this snippet in both haystack versions, 1.18.1, and 1.21 and the same results showing.
Any help!

@anakin87
Copy link
Member

anakin87 commented Oct 9, 2023

I think this should work.
(Unfortunately, it is undocumented and convoluted.)

from haystack.nodes.search_engine.providers import GoogleAPI

# see https://github.com/deepset-ai/haystack/blob/main/haystack/nodes/search_engine/providers.py#L305
search_engine = GoogleAPI (api_key="my_key", top_k=5, engine_id="engineID")

web_retriever = WebRetriever(api_key="my_key", search_engine_provider=search_engine, top_k=3)

pn = PromptNode(
    "gpt-3.5-turbo",
    api_key="my_Key",
    max_length=256,
    default_prompt_template="question-answering-with-document-scores",
)

pipeline = WebQAPipeline(retriever=web_retriever, prompt_node=pn)

@HGamalElDin
Copy link
Author

I think this should work. (Unfortunately, it is undocumented and convoluted.)

from haystack.nodes.search_engine.providers import GoogleAPI

# see https://github.com/deepset-ai/haystack/blob/main/haystack/nodes/search_engine/providers.py#L305
search_engine = GoogleAPI (api_key="my_key", top_k=5, engine_id="engineID")

web_retriever = WebRetriever(api_key="my_key", search_engine_provider=search_engine, top_k=3)

pn = PromptNode(
    "gpt-3.5-turbo",
    api_key="my_Key",
    max_length=256,
    default_prompt_template="question-answering-with-document-scores",
)

pipeline = WebQAPipeline(retriever=web_retriever, prompt_node=pn)

Thank You so much @anakin87, Yes unfortunately it's not well documented. Your solution worked for me only if I set the the mode to "preprocessed_documents", but If I left it's default value "snippet" It throws this error that I also cannot find and ref for:
Exception: Exception while running node 'Shaper': 'score'
Enable debug logging to see the data that was passed when the pipeline failed.

@anakin87
Copy link
Member

My intuition is that the prompt template question-answering-with-document-scores expects the Document scores.
If they are missing, it fails.

You can use a different prompt template.
This blog article may help you.

(@dfokina as you can understand from this issue, the example about GoogleAPI in our docs is wrong.)

@HGamalElDin
Copy link
Author

My intuition is that the prompt template question-answering-with-document-scores expects the Document scores. If they are missing, it fails.

You can use a different prompt template. This blog article may help you.

(@dfokina as you can understand from this issue, the example about GoogleAPI in our docs is wrong.)

Honestly no! I tried a custom prompt template and it throws same error! It seams the issue from the WebQAPipeline itself. It somehow includes a default shaper.

The question is, what is the key difference between WebRetriever and WebSearch classes?

@bilgeyucel
Copy link
Contributor

bilgeyucel commented Nov 7, 2023

The main issue is solved on the main branch and will be released with Haystack 1.22.0 👍

However, I can reproduce the issue about Shaper on main:
Exception: Exception while running node 'Shaper': 'score'
Enable debug logging to see the data that was passed when the pipeline failed.

This is a problem when snippets mode is used. "raw_documents" and "preprocessed_documents" are working fine. If GoogleAPI doesn't support the snippets mode, we need to update the documentation.

Here's the colab notebook to reproduce it: https://colab.research.google.com/drive/1otbIATmR_dh2RtTry7sYgF-JhasHS61B?usp=sharing

@masci masci added 1.x P3 Low priority, leave it in the backlog labels Dec 11, 2023
@masci masci added the wontfix This will not be worked on label Feb 23, 2024
@masci masci closed this as completed Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.x P3 Low priority, leave it in the backlog wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants