WebRetriver bug in Google API #6003

HGamalElDin · 2023-10-08T19:45:18Z

Hello guys! I'm trying to run a WebRetriever RAG pipeline, but the webretriever with google doesn't work anyway!

Here's a code snippet I'm trying,

pn = PromptNode(
    "gpt-3.5-turbo",
    api_key="my_Key",
    max_length=256,
    default_prompt_template="question-answering-with-document-scores",
)
web_retriever = WebRetriever(search_engine_provider="GoogleAPI", api_key="my_key", top_search_results=5, top_k=3, search_engine_kwargs={"engine_id": "engineID"})
pipeline = WebQAPipeline(retriever=web_retriever, prompt_node=pn)

I got the error: TypeError: WebRetriever.init() got an unexpected keyword argument 'search_engine_kwargs'

When I remove this argument and try to run the pipeline, I get Exception: Exception while running node 'Retriever': You need to provide an engine ID for the Google API. See https://developers.google.com/custom-search/v1/overview
Enable debug logging to see the data that was passed when the pipeline failed.

I tried this snippet in both haystack versions, 1.18.1, and 1.21 and the same results showing.
Any help!

The text was updated successfully, but these errors were encountered:

anakin87 · 2023-10-09T09:37:04Z

I think this should work.
(Unfortunately, it is undocumented and convoluted.)

from haystack.nodes.search_engine.providers import GoogleAPI

# see https://github.com/deepset-ai/haystack/blob/main/haystack/nodes/search_engine/providers.py#L305
search_engine = GoogleAPI (api_key="my_key", top_k=5, engine_id="engineID")

web_retriever = WebRetriever(api_key="my_key", search_engine_provider=search_engine, top_k=3)

pn = PromptNode(
    "gpt-3.5-turbo",
    api_key="my_Key",
    max_length=256,
    default_prompt_template="question-answering-with-document-scores",
)

pipeline = WebQAPipeline(retriever=web_retriever, prompt_node=pn)

HGamalElDin · 2023-10-09T17:06:41Z

I think this should work. (Unfortunately, it is undocumented and convoluted.)

from haystack.nodes.search_engine.providers import GoogleAPI

# see https://github.com/deepset-ai/haystack/blob/main/haystack/nodes/search_engine/providers.py#L305
search_engine = GoogleAPI (api_key="my_key", top_k=5, engine_id="engineID")

web_retriever = WebRetriever(api_key="my_key", search_engine_provider=search_engine, top_k=3)

pn = PromptNode(
    "gpt-3.5-turbo",
    api_key="my_Key",
    max_length=256,
    default_prompt_template="question-answering-with-document-scores",
)

pipeline = WebQAPipeline(retriever=web_retriever, prompt_node=pn)

Thank You so much @anakin87, Yes unfortunately it's not well documented. Your solution worked for me only if I set the the mode to "preprocessed_documents", but If I left it's default value "snippet" It throws this error that I also cannot find and ref for:
Exception: Exception while running node 'Shaper': 'score'
Enable debug logging to see the data that was passed when the pipeline failed.

anakin87 · 2023-10-10T07:18:27Z

My intuition is that the prompt template question-answering-with-document-scores expects the Document scores.
If they are missing, it fails.

You can use a different prompt template.
This blog article may help you.

(@dfokina as you can understand from this issue, the example about GoogleAPI in our docs is wrong.)

HGamalElDin · 2023-10-11T00:09:28Z

My intuition is that the prompt template question-answering-with-document-scores expects the Document scores. If they are missing, it fails.

You can use a different prompt template. This blog article may help you.

(@dfokina as you can understand from this issue, the example about GoogleAPI in our docs is wrong.)

Honestly no! I tried a custom prompt template and it throws same error! It seams the issue from the WebQAPipeline itself. It somehow includes a default shaper.

The question is, what is the key difference between WebRetriever and WebSearch classes?

bilgeyucel · 2023-11-07T11:56:13Z

The main issue is solved on the main branch and will be released with Haystack 1.22.0 👍

However, I can reproduce the issue about Shaper on main:
Exception: Exception while running node 'Shaper': 'score'
Enable debug logging to see the data that was passed when the pipeline failed.

This is a problem when snippets mode is used. "raw_documents" and "preprocessed_documents" are working fine. If GoogleAPI doesn't support the snippets mode, we need to update the documentation.

Here's the colab notebook to reproduce it: https://colab.research.google.com/drive/1otbIATmR_dh2RtTry7sYgF-JhasHS61B?usp=sharing

anakin87 mentioned this issue Nov 7, 2023

WebRetriever with GoogleAPI not working #6247

Closed

1 task

masci added 1.x P3 Low priority, leave it in the backlog labels Dec 11, 2023

masci added the wontfix This will not be worked on label Feb 23, 2024

masci closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebRetriver bug in Google API #6003

WebRetriver bug in Google API #6003

HGamalElDin commented Oct 8, 2023

anakin87 commented Oct 9, 2023

HGamalElDin commented Oct 9, 2023

anakin87 commented Oct 10, 2023

HGamalElDin commented Oct 11, 2023

bilgeyucel commented Nov 7, 2023 •

edited

Loading

WebRetriver bug in Google API #6003

WebRetriver bug in Google API #6003

Comments

HGamalElDin commented Oct 8, 2023

anakin87 commented Oct 9, 2023

HGamalElDin commented Oct 9, 2023

anakin87 commented Oct 10, 2023

HGamalElDin commented Oct 11, 2023

bilgeyucel commented Nov 7, 2023 • edited Loading

bilgeyucel commented Nov 7, 2023 •

edited

Loading