Skip to content

Commit

Permalink
Fix indexer error that double counted data in a certain edge case.
Browse files Browse the repository at this point in the history
  • Loading branch information
Jonathan Diamond committed Mar 6, 2024
1 parent aca6e3b commit 3f76f17
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions python/fusion_engine_client/parsers/fast_indexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,17 @@ def _search_blocks_for_fe(input_path: str, block_starts: List[int]):
if len(data) == _READ_SIZE_BYTES + _MAX_FE_MSG_SIZE_BYTES:
word_count = int(_READ_SIZE_BYTES / 2)
# The last read on the last thread will run out of data, so read
# whatever is left.
else:
# whatever is left. If the amount left is less then the overlap
# space (and this wasn't the first thread), this data will already
# have been processed by another thread with the `elif len(data) >=
# _MAX_FE_MSG_SIZE_BYTES` branch.
elif block_offset == 0 or len(data) >= _MAX_FE_MSG_SIZE_BYTES:
word_count = int(len(data) / 2) - 1
# If the amount left is less then the overlap space, this data will
# already have been processed by another thread with the `elif
# len(data) >= _MAX_FE_MSG_SIZE_BYTES` branch.
else:
break

# This is a fairly optimized search for preamble matches.
# Allocate space for all the message offsets to check.
Expand Down

0 comments on commit 3f76f17

Please sign in to comment.