Skip to content

Commit

Permalink
Additional header required to get style tags for visual processing.
Browse files Browse the repository at this point in the history
  • Loading branch information
jamesvillarrubia authored and ansukla committed Jun 12, 2024
1 parent 21bc8fe commit 436d0fc
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions nlm_ingestor/file_parser/tika_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def parse_to_html(self, filepath, do_ocr=False):
"X-Tika-OCRskipOcr": "false",
"X-Tika-OCRoutputType": "hocr",
"X-Tika-Timeout-Millis": str(100 * timeout),
"X-Tika-PDFOcrStrategy": "ocr_only",
"X-Tika-OCRtimeoutSeconds": str(timeout),
}

Expand Down

0 comments on commit 436d0fc

Please sign in to comment.