Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: SearchClient.browse_objects only returns last page #573

Closed
connesy opened this issue Oct 22, 2024 · 1 comment · Fixed by algolia/api-clients-automation#4016
Closed

Comments

@connesy
Copy link

connesy commented Oct 22, 2024

Description

When using SearchClientSync.browse_objects to retrieve all records (as suggested by the documentation: https://www.algolia.com/doc/libraries/python/v4/helpers/#browse-for-records), the BrowseResponse that is returned only contains the hits from the last page. Hits on all prior pages are discarded:

client = SearchClientSync(application_id, api_key)
records = client.browse_objects(
    index_name="my_index",
    aggregator=None,
    browse_params=BrowseParamsObject(
        query="",
        attributes_to_retrieve=["some_column"],
    ),
)

print(len(records.hits))  # 227
print(len(records.page))  # 38
print(records.nb_hits)    # 38940
print(records.nb_pages)   # 39

The function that is called for every request doesn't keep the records from the previous response:
algoliasearch/search/client.py
image

The function _func, which is passed to create_iterable_sync, doesn't use the previous response _prev, which it gets passed in retry:
algoliasearch/http/helpers.py
image

The result is that only the last response from self.browse(...) is actually returned. All other responses are discarded.

I found a workaround, where I create an "aggregator" that appends the hits from each response to a non-local list, but that doesn't seem like it should be necessary:

results = []
def agg(response) -> None:
	results.extend(response.hits)

client.browse_objects(
    index_name="my_index",
    aggregator=agg,
    browse_params=BrowseParamsObject(
        query="",
        attributes_to_retrieve=["some_column"],
    ),
)

print(len(results))  # 38940

### Client

Search

### Version

4.6.2

### Relevant log output

_No response_
@shortcuts
Copy link
Member

Hey, thanks for opening the issue! Actually we should make it clear that aggregator is required, the workaround you found is in fact the way to leverage this browse_objects helper, we do the pagination and you use the responses however you want

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants