diff --git a/docs/docs/language/decoding.md b/docs/docs/language/decoding.md index 769baa0d..76e5e270 100644 --- a/docs/docs/language/decoding.md +++ b/docs/docs/language/decoding.md @@ -78,19 +78,45 @@ An experimental implementation of a beam search procedure that groups by current ## Inspecting Decoding Trees -LMQL also provides a way to inspect the decoding trees generated by the decoders. For this, make sure to execute the query in the Playground IDE and click on the `Advanced Mode` button, in the top right corner of the Playground. This will open a new pane, where you can navigate and inspect the LMQL decoding tree. +LMQL also provides a way to inspect the decoding trees generated by the decoders. For this, make sure to execute the query in the Playground IDE and click on the `Advanced Mode` button, in the top right corner of the Playground. This will open a new pane, where you can navigate and inspect the LMQL decoding tree: -Among other things, this view allows you to track the decoding process, active hypotheses and interpreter state, including the current evaluation result of the `where` clause. For an example, consider the [translation example](https://lmql.ai/playground/#translation) as included in the Playground IDE (make sure to enable `Advanced Mode`). +
+ A decoding tree as visualized in the LMQL Playground. +
+
A decoding tree as visualized in the LMQL Playground.
+
+ +This view allows you to track the decoding process, active hypotheses and interpreter state, including the current evaluation result of the `where` clause. For an example, take a look at the [translation example](https://lmql.ai/playground/#translation) in the Playground (with Advanced Mode enabled). ## Writing Custom Decoders LMQL also includes a library for array-based decoding `dclib`, which can be used to implement custom decoders. More information on this, will be provided in the future. The implementation of the available decoding procedures is located in `src/lmql/runtime/dclib/decoders.py` of the LMQL repository. -## Other Decoding Parameters +## Additional Decoding Parameters + +Next to the decoding algorithm, LMQL also supports a number of additional decoding parameters, which can affect sampling behavior and token scoring: | | | | --- | --- | | `max_len: int` | The maximum length of the generated sequence. If not specified, the default value of `max_len` is `2048`. Note if the maximum length is reached, the LMQL runtime will throw an error if the query has not yet come to a valid result, according to the provided `where` clause. | +| `top_k: int` | Restricts the number of tokens to sample from in each step of the decoding process, based on [Fan et. al(2018)](https://arxiv.org/pdf/1805.04833.pdf) (only applicable for sampling decoders). +| `top_p: float` | Top-p (nucleus) sampling, based on [Holtzman et. al(2019)](https://arxiv.org/pdf/1904.09751.pdf) (only applicable for sampling decoders). | +| `repetition_penalty: float` | Repetition penalty, `1.0` means no penalty, based on [Keskar et. al(2019)](https://arxiv.org/pdf/1909.05858.pdf). The more a token is already present in the generated sequence, the more its probability will be penalized. | +| `frequency_penalty: float` | `frequency_penalty` as documented as part of the [OpenAI API](https://platform.openai.com/docs/guides/gpt/parameter-details). | +| `presence_penalty: float` | `presence_penalty` as documented as part of the [OpenAI API](https://platform.openai.com/docs/guides/gpt/parameter-details). | + +::: tip + +Note that the concrete implementation and availability of additional decoding parameters may vary across different inference backends. For reference, please see the API documentation of the respective inference interface, e.g. the [HuggingFace `generate()`](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationMixin.generate) function or the [OpenAI API](https://platform.openai.com/docs/api-reference/chat/create). + +::: + +## Runtime Parameters + +Lastly, a number of additional runtime parameters are available, which can be used to control auxiliary aspects of the decoding process: + +| | | +| --- | --- | | `chunksize: int` | The chunksize parameter used for `max_tokens` in OpenAI API requests or in speculative inference with local models. If not specified, the default value of `chunksize` is `32`. See also the description of this parameter in the [Models](../models/openai.md#monitoring-openai-api-use) chapter. | | `verbose: bool` | Enables verbose console logging for individual LLM inference calls (local generation calls or OpenAI API request payloads). | | `cache: Union[bool,str]` | `True` or `False` to enable in-memory token caching. If not specified, the default value of `cache` is `True`, indicating in-memory caching is enabled.

Setting `cache` to a string value, specifies a local file to use for disk-based caching, enabling caching across multiple query executions and sessions. |