Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Limit on term retrieval #16

Open
Angatar opened this issue Nov 25, 2024 · 2 comments
Open

API: Limit on term retrieval #16

Angatar opened this issue Nov 25, 2024 · 2 comments

Comments

@Angatar
Copy link
Member

Angatar commented Nov 25, 2024

Current Problem:
Hard limit of 1000 terms per query without documentation, making it difficult to fetch all terms in one go.

Resolution:
Allow a size=all parameter to retrieve all terms or significantly increase the maximum limit (e.g., 100,000).
Document this additional option and/or limit clearly in the API help pages.

@henrietteharmse
Copy link

We can look into increasing this limit may be to 10000. However, in load tests we have done last year fetching 100000 rows from 10 concurrent requests brought solr down. For that reason I will caution against implementing size=all.

@Angatar
Copy link
Member Author

Angatar commented Nov 26, 2024

Given the challenges you mentioned with Solr under high-load scenarios, a size=all parameter might indeed be risky. However, to address the user need for retrieving all terms efficiently, here's an alternative approach:

Proposal:
Introduce a dedicated download endpoint that allows users to retrieve the full dataset in bulk. This endpoint would not rely on real-time Solr queries but instead serve pre-aggregated or cached data in json.

Advantages of this approach:

Ensures efficient and complete data retrieval without overwhelming the Solr infrastructure.
Provides a solution for users requiring the entire dataset, complementing existing APIs.

Additionally, documenting the current limit (and any changes, such as increasing it to 10,000 terms) on the API swagger help page would be very helpful. Clear guidance on batching strategies or alternative data-fetching mechanisms would also improve user experience.

What do you think about implementing such a download endpoint alongside updating the documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants