Huggingface Transformers Tokenizer in C++ A tokenizer is in charge of preparing the inputs for a model. The tokenizer can tokenize Chinese-English bilingual in Linux. This project mainly solves some Chinese character encoding problems. Requirements Boost C++ unicode support http://github.com/ufal/unilib