Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ add support for workflow execution file send & workflow response loading #126

Merged
merged 10 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/integration-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,5 +51,6 @@ jobs:
- name: Run Rspec for integration tests
env:
MINDEE_API_KEY: ${{ secrets.MINDEE_API_KEY_SE_TESTS }}
WORKFLOW_ID: ${{ secrets.WORKFLOW_ID_SE_TESTS }}
run: |
bundle exec rake integration
29 changes: 29 additions & 0 deletions docs/code_samples/workflow_execution.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
require 'mindee'

workflow_id = 'workflow-id'

# Init a new client
mindee_client = Mindee::Client.new(api_key: 'my-api-key')

# Load a file from disk
input_source = mindee_client.source_from_path('/path/to/the/file.ext')

# Send the file to the workflow
result = mindee_client.execute_workflow(
input_source,
workflow_id
)

sebastianMindee marked this conversation as resolved.
Show resolved Hide resolved
# Alternatively, set an alias & a priority for the execution.
# result = mindee_client.execute_workflow(
# input_source,
# workflow_id,
# document_alias: "my-alias",
# priority: Mindee::Parsing::Common::ExecutionPriority::LOW
# )

# Print the execution's ID to make sure it worked
puts result.execution.id

# Print the inference, if present
# puts result.document.inference
40 changes: 40 additions & 0 deletions lib/mindee/client.rb
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,46 @@ def enqueue_and_parse(

# rubocop:enable Metrics/ParameterLists

# Sends a document to a workflow.
#
# @param input_source [Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource]
# @param document_alias [String, nil] Alias to give to the document.
# @param priority [Symbol, nil] Priority to give to the document.
# @param full_text [Boolean] Whether to include the full OCR text response in compatible APIs.
# This performs a full OCR operation on the server and may increase response time.
#
# @param public_url [String, nil] A unique, encrypted URL for accessing the document validation interface without
# requiring authentication.
# @param page_options [Hash, nil] Page cutting/merge options:
#
# * `:page_indexes` Zero-based list of page indexes.
# * `:operation` Operation to apply on the document, given the `page_indexes specified:
# * `:KEEP_ONLY` - keep only the specified pages, and remove all others.
# * `:REMOVE` - remove the specified pages, and keep all others.
# * `:on_min_pages` Apply the operation only if document has at least this many pages.
#
#
# @return [Mindee::Parsing::Common::WorkflowResponse]
def execute_workflow(
input_source,
workflow_id,
document_alias: nil,
priority: nil,
full_text: false,
public_url: nil,
page_options: nil
)
if input_source.is_a?(Mindee::Input::Source::LocalInputSource) && !page_options.nil? && input_source.pdf?
input_source.process_pdf(page_options)
end

workflow_endpoint = Mindee::HTTP::WorkflowEndpoint.new(workflow_id, api_key: @api_key)
prediction, raw_http = workflow_endpoint.execute_workflow(input_source, full_text, document_alias, priority,
public_url)
Mindee::Parsing::Common::WorkflowResponse.new(Product::Generated::GeneratedV1,
prediction, raw_http)
end

# Load a prediction.
#
# @param product_class [Mindee::Inference] class of the product
Expand Down
1 change: 1 addition & 0 deletions lib/mindee/http.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@

require_relative 'http/endpoint'
require_relative 'http/error'
require_relative 'http/workflow_endpoint'
90 changes: 90 additions & 0 deletions lib/mindee/http/workflow_endpoint.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# frozen_string_literal: true

require 'json'
require 'net/http'
require_relative 'error'

module Mindee
module HTTP
# Handles the routing for workflow calls.
class WorkflowEndpoint
# @return [String]
attr_reader :api_key
# @return [Integer]
attr_reader :request_timeout
# @return [String]
attr_reader :url

def initialize(workflow_id, api_key: '')
@request_timeout = ENV.fetch(REQUEST_TIMEOUT_ENV_NAME, TIMEOUT_DEFAULT).to_i
@api_key = api_key.nil? || api_key.empty? ? ENV.fetch(API_KEY_ENV_NAME, API_KEY_DEFAULT) : api_key
base_url = ENV.fetch(BASE_URL_ENV_NAME, BASE_URL_DEFAULT)
@url = "#{base_url.chomp('/')}/workflows/#{workflow_id}/executions"
end

# Sends a document to the workflow.
# @param input_source [Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource]
# @param document_alias [String, nil] Alias to give to the document.
# @param priority [Symbol, nil] Priority to give to the document.
# @param full_text [Boolean] Whether to include the full OCR text response in compatible APIs.
# @param public_url [String, nil] A unique, encrypted URL for accessing the document validation interface without
# requiring authentication.
# @return [Array]
def execute_workflow(input_source, full_text, document_alias, priority, public_url)
check_api_key
response = workflow_execution_req_post(input_source, document_alias, priority, full_text, public_url)
hashed_response = JSON.parse(response.body, object_class: Hash)
return [hashed_response, response.body] if ResponseValidation.valid_async_response?(response)

ResponseValidation.clean_request!(response)
error = Error.handle_error(@url_name, response)
raise error
end

# @param input_source [Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::UrlInputSource]
# @param document_alias [String, nil] Alias to give to the document.
# @param priority [Symbol, nil] Priority to give to the document.
# @param full_text [Boolean] Whether to include the full OCR text response in compatible APIs.
# @param public_url [String, nil] A unique, encrypted URL for accessing the document validation interface without
# requiring authentication.
# @return [Net::HTTPResponse, nil]
def workflow_execution_req_post(input_source, document_alias, priority, full_text, public_url)
uri = URI(@url)
params = {}
params[:full_text_ocr] = 'true' if full_text
uri.query = URI.encode_www_form(params)

headers = {
'Authorization' => "Token #{@api_key}",
'User-Agent' => USER_AGENT,
}
req = Net::HTTP::Post.new(uri, headers)
form_data = if input_source.is_a?(Mindee::Input::Source::UrlInputSource)
[['document', input_source.url]]
else
[input_source.read_document]
end
form_data.push ['alias', document_alias] if document_alias
form_data.push ['public_url', public_url] if public_url
form_data.push ['priority', priority.to_s] if priority

req.set_form(form_data, 'multipart/form-data')

response = nil
Net::HTTP.start(uri.hostname, uri.port, use_ssl: true, read_timeout: @request_timeout) do |http|
response = http.request(req)
end
response
end

# Checks API key
def check_api_key
return unless @api_key.nil? || @api_key.empty?

raise "Missing API key. Check your Client Configuration.\n" \
'You can set this using the ' \
"'#{HTTP::API_KEY_ENV_NAME}' environment variable."
end
end
end
end
3 changes: 3 additions & 0 deletions lib/mindee/parsing/common.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

require_relative 'common/api_response'
require_relative 'common/document'
require_relative 'common/execution'
require_relative 'common/execution_file'
require_relative 'common/execution_priority'
require_relative 'common/inference'
require_relative 'common/ocr'
require_relative 'common/prediction'
Expand Down
23 changes: 22 additions & 1 deletion lib/mindee/parsing/common/api_response.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class Job
attr_reader :id
# @return [Mindee::Parsing::Standard::DateField]
attr_reader :issued_at
# @return [Mindee::Parsing::Standard::DateField, nil]
# @return [Time, nil]
attr_reader :available_at
# @return [JobStatus, Symbol]
attr_reader :status
Expand Down Expand Up @@ -121,6 +121,27 @@ def initialize(product_class, http_response, raw_http)
@job = Mindee::Parsing::Common::Job.new(http_response['job']) if http_response.key?('job')
end
end

# Represents the server response after a document is sent to a workflow.
class WorkflowResponse
# Set the prediction model used to parse the document.
# The response object will be instantiated based on this parameter.
# @return [Mindee::Parsing::Common::Execution]
attr_reader :execution
# @return [Mindee::Parsing::Common::ApiRequest]
attr_reader :api_request
# @return [String]
attr_reader :raw_http

# @param http_response [Hash]
# @param product_class [Mindee::Inference]
def initialize(product_class, http_response, raw_http)
@raw_http = raw_http.to_s
@api_request = Mindee::Parsing::Common::ApiRequest.new(http_response['api_request'])
product_class = (product_class || Product::Generated::GeneratedV1)
@execution = Mindee::Parsing::Common::Execution.new(product_class, http_response['execution'])
end
end
end
end
end
73 changes: 73 additions & 0 deletions lib/mindee/parsing/common/execution.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# frozen_string_literal: true

module Mindee
module Parsing
module Common
# Identifier for the batch to which the execution belongs.
class Execution
# Identifier for the batch to which the execution belongs.
# @return [String]
attr_reader :batch_name
# The time at which the execution started.
# @return [Time, nil]
attr_reader :created_at
# File representation within a workflow execution.
# @return [ExecutionFile]
attr_reader :file
# Identifier for the execution.
# @return [String]
attr_reader :id
# Deserialized inference object.
# @return [Mindee::Inference]
attr_reader :inference
# Priority of the execution.
# @return [ExecutionPriority]
attr_reader :priority
# The time at which the file was tagged as reviewed.
# @return [Time, nil]
attr_reader :reviewed_at
# The time at which the file was uploaded to a workflow.
# @return [Time, nil]
attr_reader :available_at
# Reviewed fields and values.
# @return [Mindee::Product::Generated::GeneratedV1Document]
attr_reader :reviewed_prediction
# Execution Status.
# @return [String]
attr_reader :status
# Execution type.
# @return [String]
attr_reader :type
# The time at which the file was uploaded to a workflow.
# @return [Time, nil]
attr_reader :uploaded_at
# Identifier for the workflow.
# @return [String]
attr_reader :workflow_id

# rubocop:disable Metrics/CyclomaticComplexity

# @param product_class [Mindee::Inference]
# @param http_response [Hash]
def initialize(product_class, http_response)
@batch_name = http_response['batch_name']
@created_at = Time.iso8601(http_response['created_at']) if http_response['created_at']
@file = ExecutionFile.new(http_response['file']) if http_response['file']
@id = http_response['id']
@inference = product_class.new(http_response['inference']) if http_response['inference']
@priority = Mindee::Parsing::Common::ExecutionPriority.to_priority(http_response['priority'])
@reviewed_at = Time.iso8601(http_response['reviewed_at']) if http_response['reviewed_at']
@available_at = Time.iso8601(http_response['available_at']) if http_response['available_at']
if http_response['reviewed_prediction']
@reviewed_prediction = GeneratedV1Document.new(http_response['reviewed_prediction'])
end
@status = http_response['status']
@type = http_response['type']
@uploaded_at = Time.iso8601(http_response['uploaded_at']) if http_response['uploaded_at']
@workflow_id = http_response['workflow_id']
end
# rubocop:enable Metrics/CyclomaticComplexity
end
end
end
end
24 changes: 24 additions & 0 deletions lib/mindee/parsing/common/execution_file.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# frozen_string_literal: true

module Mindee
module Parsing
module Common
# Representation of a workflow execution's file data.
class ExecutionFile
# File name.
# @return [String]
attr_reader :name

# Optional alias for the file.
# @return [String]
attr_reader :alias

# @param http_response [Hash]
def initialize(http_response)
@name = http_response['name']
@alias = http_response['alias']
end
end
end
end
end
30 changes: 30 additions & 0 deletions lib/mindee/parsing/common/execution_priority.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# frozen_string_literal: true

module Mindee
module Parsing
module Common
# Execution policy priority values.
module ExecutionPriority
LOW = :low
MEDIUM = :medium
HIGH = :high

# Sets the priority to one of its possibly values, defaults to nil otherwise.
# @param [String, nil] priority_str
# @return [Symbol, nil]
def self.to_priority(priority_str)
return nil if priority_str.nil?

case priority_str.downcase
when 'low'
:low
when 'high'
:high
else
:medium
end
end
end
end
end
end
2 changes: 1 addition & 1 deletion spec/test_code_samples.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ API_KEY=$3
if [ -z "${ACCOUNT}" ]; then echo "ACCOUNT is required"; exit 1; fi
if [ -z "${ENDPOINT}" ]; then echo "ENDPOINT is required"; exit 1; fi

for f in $(find ./docs/code_samples -maxdepth 1 -name "*.txt" | sort -h)
for f in $(find ./docs/code_samples -maxdepth 1 -name "*.txt" -not -name "workflow_execution.txt" | sort -h)
do
echo
echo "###############################################"
Expand Down
31 changes: 31 additions & 0 deletions spec/workflow/workflow_integration.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# frozen_string_literal: true

require 'json'
require 'mindee'
require_relative '../data'

describe Mindee::Client do
describe 'execute_workflow call to API' do
let(:product_data_dir) { File.join(DATA_DIR, 'products') }
it 'should return a valid response' do
client = Mindee::Client.new
invoice_splitter_input = Mindee::Input::Source::PathInputSource.new(
File.join(product_data_dir, 'invoice_splitter', 'default_sample.pdf')
)

current_date_time = Time.now.strftime('%Y-%m-%d-%H:%M:%S')
document_alias = "ruby-#{current_date_time}"
priority = Mindee::Parsing::Common::ExecutionPriority::LOW

response = client.execute_workflow(
invoice_splitter_input,
ENV.fetch('WORKFLOW_ID'),
document_alias: document_alias,
priority: priority
)

expect(response.execution.file.alias).to eq(document_alias)
expect(response.execution.priority).to eq(priority)
end
end
end
Loading