You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to get paragraphs instead of lines in ocropy-gpageseg?
If that's not possible, from what I saw only the .pseg file contains the paragraph information encoded as a different color. Is there a utility to extract that information?
The text was updated successfully, but these errors were encountered:
No, the page segmentation will always output lines. However during this process it tries to identify the columns (I am not sure that also paragraphs are identified in this step) and this information is saved in the pseg files, as you noted already. See in the wiki for more information about the file format.
Maybe try also to go further and create a hocr file with ocropus-hocr which should contain some paragraphs splits as <p />.
On Mon, Jun 26, 2017 at 8:23 AM, Philipp Zumstein ***@***.***> wrote:
No, the page segmentation will always output lines. However during this
process it tries to identify the columns (I am not sure that also
paragraphs are identified in this step) and this information is saved in
the pseg files, as you noted already. See in the wiki for more
information about the file format
<https://github.com/tmbdev/ocropy/wiki/OCRopus-File-Formats#physical-layout>
.
Maybe try also to go further and create a hocr file with ocropus-hocr
which should contain some paragraphs splits as <p />.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#228 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AcUdDkJZfJsDCtwyCA59J1vR-fRHfSbjks5sH0BYgaJpZM4OEve->
.
Is there a way to get paragraphs instead of lines in ocropy-gpageseg?
If that's not possible, from what I saw only the .pseg file contains the paragraph information encoded as a different color. Is there a utility to extract that information?
The text was updated successfully, but these errors were encountered: