Skip to content

Commit

Permalink
fix code format
Browse files Browse the repository at this point in the history
Signed-off-by: youkaichao <[email protected]>
  • Loading branch information
youkaichao committed Nov 24, 2024
1 parent da41521 commit c7d5d4a
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions docs/source/models/adding_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,10 @@ For instance, vLLM's `OPT model <https://github.com/vllm-project/vllm/blob/main/

To ensure compatibility with vLLM, your model must meet the following requirements:

1. Initialization Code: All vLLM modules within the model must include a ``prefix`` argument in their constructor. This ``prefix`` is typically the full name of the module in the model's state dictionary and is crucial for:
Initialization Code
^^^^^^^^^^^^^^^^^^^

All vLLM modules within the model must include a ``prefix`` argument in their constructor. This ``prefix`` is typically the full name of the module in the model's state dictionary and is crucial for:

* Runtime support: vLLM's attention operators are registered in a model's state by their full names. Each attention operator must have a unique prefix as its layer name to avoid conflicts.
* Non-uniform quantization support: A quantized checkpoint can selectively quantize certain layers while keeping others in full precision. By providing the ``prefix`` during initialization, vLLM can match the current layer's ``prefix`` with the quantization configuration to determine if the layer should be initialized in quantized mode.
Expand Down Expand Up @@ -78,7 +81,10 @@ The initialization code should look like this:
super().__init__()
self.model = MyModel(vllm_config, prefix=f"{prefix}.model")
2. Computation Code: Rewrite the :meth:`~torch.nn.Module.forward` method of your model to remove any unnecessary code, such as training-specific code. Modify the input parameters to treat `input_ids` and `positions` as flattened tensors with a single batch size dimension, without a max-sequence length dimension.
Computation Code
^^^^^^^^^^^^^^^^

Rewrite the :meth:`~torch.nn.Module.forward` method of your model to remove any unnecessary code, such as training-specific code. Modify the input parameters to treat `input_ids` and `positions` as flattened tensors with a single batch size dimension, without a max-sequence length dimension.

.. code-block:: python
Expand Down

0 comments on commit c7d5d4a

Please sign in to comment.