Skip to content

Commit

Permalink
fix: page_number for pdfs with pages not containing any text
Browse files Browse the repository at this point in the history
  • Loading branch information
tstadel committed Feb 29, 2024
1 parent 76ff8e2 commit ae20bae
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions haystack/nodes/preprocessor/preprocessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -641,6 +641,9 @@ def _concatenate_units(
else:
num_page_breaks = len(processed_units)
cur_page += num_page_breaks
else:
if self.add_page_number and split_at == "\f":
cur_page += 1

return text_splits, splits_pages, splits_start_idxs

Expand Down

0 comments on commit ae20bae

Please sign in to comment.