Add support for `DonutSwin` models (Closes #318) #320

xenova · 2023-09-19T17:20:44Z

TODO

~~Step-by-step Document Image Classification~~ Postponed due to bug in optimum
Step-by-step Document Parsing
Step-by-step Document Visual Question Answering (DocVQA)
Make it usable via pipeline API

Example usage

Examples adapted from https://huggingface.co/docs/transformers/model_doc/donut

Document parsing

let model_id = 'Xenova/donut-base-finetuned-cord-v2';

// Prepare image inputs
let processor = await AutoProcessor.from_pretrained(model_id);
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/receipt.png';
let image = await RawImage.read(url);
let image_inputs = await processor(image);

// Prepare decoder inputs
let task_prompt = "<s_cord-v2>"
let tokenizer = await AutoTokenizer.from_pretrained(model_id);
let decoder_input_ids = tokenizer(task_prompt, {
    add_special_tokens: false,
}).input_ids;

// Create the model
let model = await AutoModelForVision2Seq.from_pretrained(model_id);

// Run inference
let output = await model.generate(image_inputs.pixel_values, {
    decoder_input_ids,
    max_length: model.config.decoder.max_position_embeddings,
})

// Decode output
let decoded = tokenizer.batch_decode(output)[0];
console.log(decoded);
// <s_cord-v2><s_menu><s_nm> CINNAMON SUGAR</s_nm><s_unitprice> 17,000</s_unitprice><s_cnt> 1 x</s_cnt><s_price> 17,000</s_price></s_menu><s_sub_total><s_subtotal_price> 17,000</s_subtotal_price></s_sub_total><s_total><s_total_price> 17,000</s_total_price><s_cashprice> 20,000</s_cashprice><s_changeprice> 3,000</s_changeprice></s_total></s>

Document Visual Question Answering (DocVQA)

let model_id = 'Xenova/donut-base-finetuned-docvqa';

// Prepare image inputs
let processor = await AutoProcessor.from_pretrained(model_id);
let url = 'https://i.imgur.com/i3asmW8.png';
let image = await RawImage.read(url);
let image_inputs = await processor(image);

// Prepare decoder inputs
let question = 'What is the invoice number?';
let task_prompt = `<s_docvqa><s_question>${question}</s_question><s_answer>`
let tokenizer = await AutoTokenizer.from_pretrained(model_id);
let decoder_input_ids = tokenizer(task_prompt, {
    add_special_tokens: false,
}).input_ids;

// Create the model
let model = await AutoModelForVision2Seq.from_pretrained(model_id);

// Run inference
let output = await model.generate(image_inputs.pixel_values, {
    decoder_input_ids,
    max_length: model.config.decoder.max_position_embeddings,
})

// Decode output
let decoded = tokenizer.batch_decode(output)[0];
console.log(decoded);
// <s_docvqa><s_question> What is the invoice number?</s_question><s_answer> us-001</s_answer></s>

HuggingFaceDocBuilderDev · 2023-09-19T17:25:58Z

The documentation is not available anymore as the PR was closed or merged.

xenova · 2023-09-24T00:36:14Z

Adding Document Image Classification might be delayed due to huggingface/optimum#1412

xenova added 6 commits September 19, 2023 18:44

Add add_special_tokens option to tokenizers

75dd1c6

Improve error messages for loading processors

479a934

Add DonutFeatureExtractor

4555b97

Add DonutSwinModel and MBartForCausalLM models

841cdb8

Fix addPastKeyValues for VisionEncoderDecoder models

b0cb176

Add Donut to list of supported models

a76d41c

xenova linked an issue Sep 19, 2023 that may be closed by this pull request

[Model request] Add support for donut models #318

Closed

xenova mentioned this pull request Sep 19, 2023

[Model request] Add support for donut models #318

Closed

xenova added 9 commits September 19, 2023 21:32

Make encode parameters optional

7883c0b

Support batched decoder input ids

fb6fc24

Remove unused import

a2a5aa0

Add do_thumbnail for donut image processing

f614c3e

Fix TypeError: decoder_input_ids[i].map is not a function

14968ac

Only pad if width and height specified in size

5824968

Only pad if pad_size is defined

e74844f

Merge branch 'main' into add-donut-support

279cb7c

Only cut decoder_input_ids if past model output

4fa3b7d

xenova added 5 commits September 24, 2023 03:08

Add donut model

f4de261

Add example usage to JSDoc for DonutSwinModel

9810d82

Add support for DocumentQuestionAnsweringPipeline

dc6fc56

Add simple document question answering unit test

5f2ab62

Add listed support for document QA pipeline

437b6b0

xenova merged commit d307f27 into main Sep 26, 2023
4 checks passed

xenova changed the title ~~Add support for DonutSwim models (Closes #318)~~ Add support for DonutSwin models (Closes #318) Sep 26, 2023

xenova mentioned this pull request Oct 30, 2023

[Document Understanding] Can we support a new task on document understanding? #218

Closed

xenova deleted the add-donut-support branch December 13, 2023 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `DonutSwin` models (Closes #318) #320

Add support for `DonutSwin` models (Closes #318) #320

xenova commented Sep 19, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 19, 2023 •

edited

Loading

xenova commented Sep 24, 2023

Add support for DonutSwin models (Closes #318) #320

Add support for DonutSwin models (Closes #318) #320

Conversation

xenova commented Sep 19, 2023 • edited Loading

TODO

Example usage

Document parsing

Document Visual Question Answering (DocVQA)

HuggingFaceDocBuilderDev commented Sep 19, 2023 • edited Loading

xenova commented Sep 24, 2023

Add support for `DonutSwin` models (Closes #318) #320

Add support for `DonutSwin` models (Closes #318) #320

xenova commented Sep 19, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 19, 2023 •

edited

Loading