Skip to content

Commit

Permalink
Add text extraction based on ToUnicode cmap (#314)
Browse files Browse the repository at this point in the history
* Add ToUnicode CMap text decoding

* add new dictionary test

* Temporarily switch default parser to pom

* Refactor common cmap parse structures for nom implementation

* Add unicode tests

* Add nom parser for ToUnicode font key

* Try to use ToUnicode for text extraction without encoding.

* Fix clippy passing unit type warnings in nom parser

* Add load unicode async test

* Remove option form encode/decode functions delegating error handling on user

---------

Co-authored-by: Marinus Enzinger <[email protected]>
  • Loading branch information
dkaluza and Marinus Enzinger authored Aug 23, 2024
1 parent 22c5153 commit 5859443
Show file tree
Hide file tree
Showing 17 changed files with 2,131 additions and 101 deletions.
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ serde = { version = "1.0", features = ["derive"], optional = true }
time = { version = "0.3", features = ["formatting", "parsing"] }
tokio = { version = "1", features = ["fs", "io-util"], optional = true }
weezl = "0.1"
rangemap = "1.5"

[dev-dependencies]
clap = { version = "4.0", features = ["derive"] }
Expand Down
Binary file added assets/unicode.pdf
Binary file not shown.
Loading

0 comments on commit 5859443

Please sign in to comment.