Skip to content

v2.2.0

Compare
Choose a tag to compare
@tsproisl tsproisl released this 18 Jan 09:50
· 133 commits to master since this release
  • New feature: Prune XML tags and their contents from the input before tokenization (via the command line option --prune TAGNAME1 --prune TAGNAME2 … or by passing prune_tags=["TAGNAME1", "TAGNAME2", …] to tokenize_xml or tokenize_xml_file). This can be useful when processing HTML files, e.g. for removing any <script> and <style> tags from the input.