You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to use blingfire to get byte offsets of tokens or sentence boundaries in the original input bytes rather than the constructed output byte array? I'm interested in non-destructively storing my text content and my token boundaries so I can do things like pre-process sentence breaks and tokens offline and then slice my content at runtime without having to store both the original content (which I may want to do further processing on) and the output of blingfire.
-- Eric
The text was updated successfully, but these errors were encountered:
Hi all,
Is there a way to use blingfire to get byte offsets of tokens or sentence boundaries in the original input bytes rather than the constructed output byte array? I'm interested in non-destructively storing my text content and my token boundaries so I can do things like pre-process sentence breaks and tokens offline and then slice my content at runtime without having to store both the original content (which I may want to do further processing on) and the output of blingfire.
-- Eric
The text was updated successfully, but these errors were encountered: