From e98369f310fd0ad84cc0cdf3f019ff0b419530f6 Mon Sep 17 00:00:00 2001
From: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>
Date: Wed, 27 Nov 2024 10:47:51 -0800
Subject: [PATCH] [Grammar][Fix] Pass in stop tokens to xgrammar TokenizerInfo
 (#642)

Prior to this PR, using models such as SmolLM, which has `<|endoftext|>`
as an unk token and `<|im_end|>` as a stop token, runs into issues with
XGrammar. This is because XGrammar has a builtin set of stop tokens,
which includes `<|endoftext|>` but not `<|im_end|>`. This results in, at
the end of a structured generation, `<|endoftext|>` is forced to be
generated (as it is the only stop token recognized), but since it is not
an actual stop token, the generation of the model does not stop.

This PR explicitly passes in the stop tokens (recognized from
`mlc-chat-config.json`) to `createTokenizerInfo()` so we do not use the
built-in set of stop tokens. In the case above, `<|im_end|>` will be the
only stop token used by XGrammar, fixing the issue.
---
 src/llm_chat.ts | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/llm_chat.ts b/src/llm_chat.ts
index 0451da3a..7050e00b 100644
--- a/src/llm_chat.ts
+++ b/src/llm_chat.ts
@@ -554,6 +554,7 @@ export class LLMChatPipeline {
               this.token_postproc_method,
               this.prepend_space_in_encode,
               this.fullVocabSize,
+              this.stopTokens,
             );
             this.grammarCompiler =
               await xgr.GrammarCompiler.createGrammarCompiler(