Skip to content

Commit

Permalink
docs updates and typing fix
Browse files Browse the repository at this point in the history
  • Loading branch information
cnewell committed Dec 5, 2024
1 parent eebef05 commit 4bd03d2
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 4 deletions.
42 changes: 42 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,48 @@ The quantized model is sensitive to input types and CUDA handling. To avoid pote

Additional tools for evaluating OLMo models are available at the [OLMo Eval](https://github.com/allenai/OLMo-eval) repo.

## Hosting

An example script is provided for hosting an OLMo 2 model on Modal.com using a the OpenAI API in ./scripts/olmo2_modal_openai.py.
To run that:
<ol>
<li>Follow the instructions under Getting Started in [the Modal.com Guide](https://modal.com/docs/guide) to install
the Modal library and command line tools.</li>
<li>Follow the instructions under [Secrets](https://modal.com/docs/guide/secrets) in the Modal.com Guide to create a Modal secret named "example-secret-token"
that defines a value for the variable MODAL_TOKEN for your server.</li>
<li>Then run
```bash
modal deploy ./scripts/olmo2_modal_openai.py
```
</li>
</ol>

You can check your endpoint using curl similar to the following:
```bash
curl -X POST \
-H "Authorization: Bearer [the secret token from above]" \
-H "Content-Type: application/json" \
-d @body.json \
https://[the web endpoint modal creates above]/v1/chat/completions
```

where `body.json` is of the form:
```
{
"model": "OLMo-2-1124-13B-Instruct",
"messages": [
{
"role": "user",
"content": "Who was Alan Turing?"
}
],
"max_tokens": 100,
"temperature": 0.9,
"stream": true
}
```


## Citing

```bibtex
Expand Down
8 changes: 4 additions & 4 deletions scripts/olmo2_modal_openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ def download_model_to_image(model_dir, model_name, model_revision):
# Our first order of business is to define the environment our server will run in
# the container image. (See https://modal.com/docs/guide/custom-container)

# This differs from vllm_interface.py in two major ways: first, as of the time this
# This differs from vllm_inference.py in two major ways: first, as of the time this
# is being written, the OLMo 2 model architecture requires building vLLM and transformers
# from github commits that are too recent to have tagged versions, requiring that they
# be built. The nvidia/cuda base image, git and build-essential apt_install, and the
Expand Down Expand Up @@ -145,7 +145,7 @@ def serve():

# This example uses a token defined in the Modal secret linked above,
# as described here: https://modal.com/docs/guide/secrets
async def is_authenticated(api_key: str = fastapi.Security(http_bearer)):
async def is_authenticated(api_key = fastapi.Security(http_bearer)):
if api_key.credentials != os.getenv("MODAL_TOKEN"):
raise fastapi.HTTPException(
status_code=fastapi.status.HTTP_401_UNAUTHORIZED,
Expand Down Expand Up @@ -207,7 +207,7 @@ async def is_authenticated(api_key: str = fastapi.Security(http_bearer)):

# To deploy the API on Modal, just run
# ```bash
# modal deploy vllm_inference.py
# modal deploy olmo2_modal_openai.py
# ```

# This will create a new app on Modal, build the container image for it, and deploy.
Expand All @@ -218,7 +218,7 @@ async def is_authenticated(api_key: str = fastapi.Security(http_bearer)):
# something like `https://your-workspace-name--olmo-2-1124-instruct-openai-serve.modal.run`.

# You can find [interactive Swagger UI docs](https://swagger.io/tools/swagger-ui/)
# at the `/docs` route of that URL, i.e. `https://your-workspace-name--example-vllm-openai-compatible-serve.modal.run/docs`.
# at the `/docs` route of that URL, i.e. `https://your-workspace-name--olmo-2-1124-instruct-openai-serve.modal.run/docs`.
# These docs describe each route and indicate the expected input and output
# and translate requests into `curl` commands. They also demonstrate authentication.

Expand Down

0 comments on commit 4bd03d2

Please sign in to comment.