Add Azure Document Intelligence Read Tool #36
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new parsing tool,
azure_di_read
, which leverages Azure's Document Intelligence service to extract text from various document types, including PDFs, images, and scanned documents.Features
Usage
To use this new tool, users need to:
DOCUMENTINTELLIGENCE_API_KEY
andDOCUMENTINTELLIGENCE_ENDPOINT
Example configuration:
Testing
A new test case has been added to verify the functionality of the
azure_di_read
tool.Documentation
The
azure_di_read
function is documented with a docstring, explaining its parameters and usage. Additional documentation has been added to the parsing tools section of the project documentation.Note that this requires the user to have set up Azure Document Intelligence. This is not great; we should explore an off-the-shelf OCR option as discussed in #3