Skip to content

Commit

Permalink
Fix indexer error that double counted data in a certain edge case. (#298
Browse files Browse the repository at this point in the history
)

# Bug Fixes
 - Indexer could repeat the last few messages in certain edge cases
  • Loading branch information
axlan authored Mar 6, 2024
2 parents aca6e3b + 3f76f17 commit a143a31
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions python/fusion_engine_client/parsers/fast_indexer.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,17 @@ def _search_blocks_for_fe(input_path: str, block_starts: List[int]):
if len(data) == _READ_SIZE_BYTES + _MAX_FE_MSG_SIZE_BYTES:
word_count = int(_READ_SIZE_BYTES / 2)
# The last read on the last thread will run out of data, so read
# whatever is left.
else:
# whatever is left. If the amount left is less then the overlap
# space (and this wasn't the first thread), this data will already
# have been processed by another thread with the `elif len(data) >=
# _MAX_FE_MSG_SIZE_BYTES` branch.
elif block_offset == 0 or len(data) >= _MAX_FE_MSG_SIZE_BYTES:
word_count = int(len(data) / 2) - 1
# If the amount left is less then the overlap space, this data will
# already have been processed by another thread with the `elif
# len(data) >= _MAX_FE_MSG_SIZE_BYTES` branch.
else:
break

# This is a fairly optimized search for preamble matches.
# Allocate space for all the message offsets to check.
Expand Down

0 comments on commit a143a31

Please sign in to comment.