LMQL support various decoding algorithms, which are used to generate text from the token distribution of a language model. The decoding algorithm in use, is specified right at the beginning of a query, e.g. argmax. Here, we provide a brief overview of the currently supported decoders.

LMQL also includes a library for array-based decoding dclib, which can be used to implement custom decoders. More information on this, will be provided in the future. The implementation of the available decoding procedures is located in src/lmql/runtime/dclib/ of the LMQL repository.

In general, all LMQL decoding algorithms are model-agnostic and can be used with any LMQL-supported inference backend. For more information on the supported inference backends, see the Models chapter.

Specifying The Decoding Algorithm#

Depending on the context, LMQL offers two ways to specify the decoding algorithm to use.

Queries with Decoding Clause: The first option is to simply specify the decoding algorithm and its parameters as part of the query itself. This can be particularly useful, if your choice of decoder is relevant and should be part of your program.

    "This is a query with a specified decoder: [RESPONSE]

Specifying the Decoding Algorithm Externally: The second option is to specify the decoding algorithm and parameters externally, i.e. separatly from the actual program code:

import lmql

@lmql.query(model="openai/text-davinci-003", decoder="sample", temperature=1.8)
def tell_a_joke():
    """A list good dad joke. A indicates the punchline:
    A:[PUNCHLINE]""" where STOPS_AT(JOKE, "?") and  STOPS_AT(PUNCHLINE, "\n")

tell_a_joke() # uses the decoder specified in @lmql.query(...)
tell_a_joke(decoder="beam", n=2) # uses a beam search decoder with n=2

This is only possible when using LMQL from a Python program. For more information on this, also see the chapter on how to specify the model to use for decoding.

Supported Decoding Algorithms#

In general, the very first keyword of an LMQL query, specifies the decoding algorithm to use. For this, the following decoder keywords are available:


The argmax decoder is the simplest decoder available in LMQL. It greedily selects the most likely token at each step of the decoding process. It has no additional parameters. Since argmax decoding is deterministic, one can only generate a single sequence at a time.

sample(n: int, temperature: float)#

The sample decoder samples n sequences in parallel from the model. The temperature parameter controls the randomness of the sampling process. Higher values of temperature lead to more random samples, while lower values lead to more likely samples. A temperature value of 0.0 is equivalent to the argmax decoder.

beam(n: int)#

A simple beam search decoder. The n parameter controls the beam size. The beam search decoder is deterministic, so it will generate the same n sequences every time. The result of a beam query is a list of n sequences, sorted by their likelihood.

beam_sample(n: int, temperature: float)#

A beam search decoder that samples from the beam at each step. The n parameter controls the beam size, while the temperature parameter controls the randomness of the sampling process. The result of a beam_sample query is a list of n sequences, sorted by their likelihood.

Novel Decoders#

LMQL also implements a number of novel decoders. These decoders are experimental and may not work as expected. They are also not guaranteed to be stable across different LMQL versions. More documentation on these decoders will be provided in the future.

var(b: int, n: int)#

An experimental implementation of variable-level beam search.

beam_var(n: int)#

An experimental implementation of a beam search procedure that groups by currently-decoded variable and applies adjusted length penalties.

Inspecting Decoding Trees#

LMQL also provides a way to inspect the decoding trees generated by the decoders. For this, make sure to execute the query in the Playground IDE and click on the Advanced Mode button, in the top right corner of the Playground. This will open a new pane, where you can navigate and inspect the LMQL decoding tree.

Among other things, this view allows you to track the decoding process, active hypotheses and interpreter state, including the current evaluation result of the where clause. For an example, consider the translation example as included in the Playground IDE (make sure to enable Advanced Mode).

Other Decoding Parameters#

  • max_len: int - The maximum length of the generated sequence. If not specified, the default value of max_len is 512. Note if the maximum length is reached, the LMQL runtime will throw an error if the query has not yet come to a valid result, according to the provided where clause.

  • openai_chunksize: int - The chunksize parameter for OpenAI’s Completion API. If not specified, the default value of openai_chunksize is 32. See also the description of this parameter in the Models chapter.