You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve type annotations and remove some ignored errors.
Support for new OpenAI models announced June 13th 2023.
Improved support for model fallbacks. Now if a request has 6k tokens and the model list looks like ['gpt-3.5-turbo', 'gpt-3.5-turbo-16k'], the 16k model will be used automatically since the default 4k model will not be able to handle the request.
0.5.0 - 2023-06-06
Restore PaginatedSchemaScraper and add documentation for pagination.
Documentation improvements.
Small quality-of-life improvements such as better pydantic schema support and
more useful error messages.
0.4.4 - 2023-03-31
Deactivate HallucinationCheck by default, it is overly aggressive and needs more work to be useful without raising false positives.
Bugfix for postprocessors parameter behavior not overriding defaults.
0.4.2 - 2023-03-26
Fix type bug with JSON nudging.
Improve HallucinationCheck to handle more cases.
More tests!
0.4.1 - 2023-03-24
Fix bug with HallucinationCheck.
0.4.0 - 2023-03-24
New configurable pre- and post-processing pipelines for customizing behavior.
Addition of ScrapeResult object to hold results of scraping along with metadata.
Support for pydantic models as schemas and for validation.
"Hallucination" check to ensure that the data in the response truly exists on the page.
Use post-processing pipeline to "nudge" JSON errors to a better result.
Now fully type-annotated.
Another big refactor, separation of API calls and scraping logic.
Finally, a ghost logo reminiscent of library's namesake.
0.3.0 - 2023-03-20
Add tests, docs, and complete examples!
Add preprocessors to SchemaScraper to allow for uniform interface for cleaning & selecting HTML.
Use tiktoken for accurate token counts.
New cost_estimate utility function.
Cost is now tracked on a per-scraper basis (see the total_cost attribute on SchemaScraper objects).
SchemaScraper now takes a max_cost parameter to limit the total cost of a scraper.
Prompt improvements, list mode simplification.
0.2.0 - 2023-03-18
Add list mode, auto-splitting, and pagination support.
Improve xpath and css handling.
Improve prompt for GPT 3.5.
Make it possible to alter parameters when calling scrape.