Skip to content

Commit

Permalink
decoder details
Browse files Browse the repository at this point in the history
  • Loading branch information
lbeurerkellner committed Oct 13, 2023
1 parent b27dc19 commit e3db6fa
Showing 1 changed file with 29 additions and 3 deletions.
32 changes: 29 additions & 3 deletions docs/docs/language/decoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,19 +78,45 @@ An experimental implementation of a beam search procedure that groups by current

## Inspecting Decoding Trees

LMQL also provides a way to inspect the decoding trees generated by the decoders. For this, make sure to execute the query in the Playground IDE and click on the `Advanced Mode` button, in the top right corner of the Playground. This will open a new pane, where you can navigate and inspect the LMQL decoding tree.
LMQL also provides a way to inspect the decoding trees generated by the decoders. For this, make sure to execute the query in the Playground IDE and click on the `Advanced Mode` button, in the top right corner of the Playground. This will open a new pane, where you can navigate and inspect the LMQL decoding tree:

Among other things, this view allows you to track the decoding process, active hypotheses and interpreter state, including the current evaluation result of the `where` clause. For an example, consider the [translation example](https://lmql.ai/playground/#translation) as included in the Playground IDE (make sure to enable `Advanced Mode`).
<figure align="center" style="width: 80%; margin: auto;" alt="A decoding tree as visualized in the LMQL Playground.">
<img style="min-height: 100pt" src="https://github.com/eth-sri/lmql/assets/17903049/55952f22-f739-416d-9c58-77690524ee50" alt="A decoding tree as visualized in the LMQL Playground."/>
<br/>
<figcaption>A decoding tree as visualized in the <a href="https://lmql.ai/playground">LMQL Playground</a>.</figcaption>
</figure>

This view allows you to track the decoding process, active hypotheses and interpreter state, including the current evaluation result of the `where` clause. For an example, take a look at the [translation example](https://lmql.ai/playground/#translation) in the Playground (with Advanced Mode enabled).

## Writing Custom Decoders

LMQL also includes a library for array-based decoding `dclib`, which can be used to implement custom decoders. More information on this, will be provided in the future. The implementation of the available decoding procedures is located in `src/lmql/runtime/dclib/decoders.py` of the LMQL repository.

## Other Decoding Parameters
## Additional Decoding Parameters

Next to the decoding algorithm, LMQL also supports a number of additional decoding parameters, which can affect sampling behavior and token scoring:

| | |
| --- | --- |
| `max_len: int` | The maximum length of the generated sequence. If not specified, the default value of `max_len` is `2048`. Note if the maximum length is reached, the LMQL runtime will throw an error if the query has not yet come to a valid result, according to the provided `where` clause. |
| `top_k: int` | Restricts the number of tokens to sample from in each step of the decoding process, based on [Fan et. al(2018)](https://arxiv.org/pdf/1805.04833.pdf) (only applicable for sampling decoders).
| `top_p: float` | Top-p (nucleus) sampling, based on [Holtzman et. al(2019)](https://arxiv.org/pdf/1904.09751.pdf) (only applicable for sampling decoders). |
| `repetition_penalty: float` | Repetition penalty, `1.0` means no penalty, based on [Keskar et. al(2019)](https://arxiv.org/pdf/1909.05858.pdf). The more a token is already present in the generated sequence, the more its probability will be penalized. |
| `frequency_penalty: float` | `frequency_penalty` as documented as part of the [OpenAI API](https://platform.openai.com/docs/guides/gpt/parameter-details). |
| `presence_penalty: float` | `presence_penalty` as documented as part of the [OpenAI API](https://platform.openai.com/docs/guides/gpt/parameter-details). |

::: tip

Note that the concrete implementation and availability of additional decoding parameters may vary across different inference backends. For reference, please see the API documentation of the respective inference interface, e.g. the [HuggingFace `generate()`](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationMixin.generate) function or the [OpenAI API](https://platform.openai.com/docs/api-reference/chat/create).

:::

## Runtime Parameters

Lastly, a number of additional runtime parameters are available, which can be used to control auxiliary aspects of the decoding process:

| | |
| --- | --- |
| `chunksize: int` | The chunksize parameter used for `max_tokens` in OpenAI API requests or in speculative inference with local models. If not specified, the default value of `chunksize` is `32`. See also the description of this parameter in the [Models](../models/openai.md#monitoring-openai-api-use) chapter. |
| `verbose: bool` | Enables verbose console logging for individual LLM inference calls (local generation calls or OpenAI API request payloads). |
| `cache: Union[bool,str]` | `True` or `False` to enable in-memory token caching. If not specified, the default value of `cache` is `True`, indicating in-memory caching is enabled. <br/><br/> Setting `cache` to a string value, specifies a local file to use for disk-based caching, enabling caching across multiple query executions and sessions. |
Expand Down

0 comments on commit e3db6fa

Please sign in to comment.