You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the dictionary cannot handle duplicate entries. It would be interesting if this would be supported. Possibly a flag that allows one to 'allow' multiples would be a feature.
Use Case
When using code-switched tokenizers (Like the 'AggregateTokenizer' in NeMo) you may have the same token appear twice. For example "Is" in the Dutch language and "Is" in the English language. Generally, we observe better Word Error Rates when using code-switched (aggregate) tokenizers as opposed to single tokenizers.
Additional Context
I would be happy to implement this feature, if, this is something the Flashlight team would be open to!
The text was updated successfully, but these errors were encountered:
Feature Description
Currently, the dictionary cannot handle duplicate entries. It would be interesting if this would be supported. Possibly a flag that allows one to 'allow' multiples would be a feature.
Use Case
When using code-switched tokenizers (Like the 'AggregateTokenizer' in NeMo) you may have the same token appear twice. For example "Is" in the Dutch language and "Is" in the English language. Generally, we observe better Word Error Rates when using code-switched (aggregate) tokenizers as opposed to single tokenizers.
Additional Context
I would be happy to implement this feature, if, this is something the Flashlight team would be open to!
The text was updated successfully, but these errors were encountered: