Skip to content

Commit

Permalink
docs: NVIDIA generators (#917)
Browse files Browse the repository at this point in the history
Add docs for NVIDIA generators
  • Loading branch information
jmartin-tech authored Sep 23, 2024
2 parents 109337d + 2b9f682 commit b60bd9d
Show file tree
Hide file tree
Showing 6 changed files with 140 additions and 11 deletions.
18 changes: 18 additions & 0 deletions docs/source/garak.generators.guardrails.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,24 @@
garak.generators.guardrails
===========================

This is a generator for warpping a NeMo Guardrails configuration. Using this
garak generator enables security testing of a Guardrails config.

The ``guardrails`` generator expects a path to a valid Guardrails configuration
to be passed as its name. For example,

.. code-block::
garak -m guardrails -n sample_abc/config
This generator requires installation of the `guardrails <https://pypi.org/project/nemoguardrails/>`_
Python package.

When invoked, garak sends prompts in series to the Guardrails setup using
``rails.generate``, and waits for a response. The generator does not support
parallisation, so it's recommended to run smaller probes, or set ``generations``
to a low value, in order to reduce garak run time.

.. automodule:: garak.generators.guardrails
:members:
:undoc-members:
Expand Down
20 changes: 20 additions & 0 deletions docs/source/garak.generators.nemo.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
garak.generators.nemo
=====================

Wrapper for `nemollm <https://pypi.org/project/nemollm/>`_.

Expects NGC API key in the environment variable ``NGC_API_KEY`` and the
organisation ID in environment variable ``ORG_ID``.

Configurable values:

* temperature: 0.9
* top_p: 1.0
* top_k: 2
* repetition_penalty: 1.1 - between 1 and 2 incl., or none
* beam_search_diversity_rate: 0.0
* beam_width: 1
* length_penalty: 1
* guardrail: None - (present in API but not implemented in library)
* api_uri: "https://api.llm.ngc.nvidia.com/v1" - endpoint URI




.. automodule:: garak.generators.nemo
:members:
:undoc-members:
Expand Down
92 changes: 92 additions & 0 deletions docs/source/garak.generators.nvcf.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,98 @@
garak.generators.nvcf
=====================

This garak generator is a connector to NVIDIA Cloud Functions. It permits fast
and flexible generation.

NVCF functions work by sending a request to an invocation endpoint, and then polling
a status endpoint until the response is received. The cloud function is described
using a UUID, which is passed to garak as the ``model_name``. API key should be placed in
environment variable ``NVCF_API_KEY`` or set in a garak config. For example:

.. code-block::
export NVCF_API_KEY="example-api-key-xyz"
garak -m nvcf -n 341da0d0-aa68-4c4f-89b5-fc39286de6a1
Configuration
-------------

Configurable values:

* temperature - Temperature for generation. Passed as a value to the endpoint.
* top_p - Number of tokens to sample. Passed as a value to the endpoint.
* invoke_uri_base - Base URL for the NVCF endpoint (default is for NVIDIA-hosted functions).
* status_uri_base - URL to check for request status updates (default is for NVIDIA-hosted functions).
* timeout - Read timeout for HTTP requests (note, this is network timeout, distinct from inference timeout)
* version_id - API version id, postpended to endpoint URLs if supplied
* stop_on_404 - Give up on endpoints returning 404 (i.e. nonexistent ones)
* extra_params - Dictionary of optional extra values to pass to the endpoint. Default ``{"stream": False}``.

Some NVCF instances require custom parameters, for example a "model" header. These
can be asserted in the NVCF config. For example, this cURL maps to the following
garak YAML:


.. code-block::
curl -s -X POST 'https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/341da0d0-aa68-4c4f-89b5-fc39286de6a1' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer example-api-key-xyz' \
-d '{
"messages": [{"role": "user", "content": "How many letters are in the word strawberry?"}],
"model": "prefix/obsidianorder/terer-nor",
"max_tokens": 1024,
"stream": false
}'
.. code-block:: yaml
---
plugins:
generators:
nvcf:
NvcfChat:
api_key: example-api-key-xyz
max_tokens: 1024
extra_params:
stream: false
model: prefix/obsidianorder/terer-nor
model_type: nvcf.NvcfChat
model_name: 341da0d0-aa68-4c4f-89b5-fc39286de6a1
The ``nvcf`` generator uses the standard garak generator mechanism for
``max_tokens``, which is why this value is set at generator-level rather than
as a key-value pair in ``extra_params``.


Scaling
-------

The NVCF generator supports parallelisation and it's recommended to use this,
invoking garak with ``--parallel_attempts`` set to a value higher than one.
IF the NVCF times out due to insufficient capacity, garak will note this,
backoff, and retry the request later.

.. code-block::
garak -m nvcf -n 341da0d0-aa68-4c4f-89b5-fc39286de6a1 --parallel_attempts 32
Or, as yaml config:

.. code-block:: yaml
---
system:
parallel_attempts: 32
plugins:
model_type: nvcf.NvcfChat
model_name: 341da0d0-aa68-4c4f-89b5-fc39286de6a1
.. automodule:: garak.generators.nvcf
:members:
:undoc-members:
Expand Down
4 changes: 2 additions & 2 deletions garak/generators/nemo.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class NeMoGenerator(Generator):
"beam_width": 1,
"length_penalty": 1,
"guardrail": None, # NotImplemented in library
"api_host": "https://api.llm.ngc.nvidia.com/v1",
"api_uri": "https://api.llm.ngc.nvidia.com/v1",
}

supports_multiple_generations = False
Expand All @@ -48,7 +48,7 @@ def __init__(self, name=None, config_root=_config):
super().__init__(self.name, config_root=config_root)

self.nemo = nemollm.api.NemoLLM(
api_host=self.api_host, api_key=self.api_key, org_id=self.org_id
api_host=self.api_uri, api_key=self.api_key, org_id=self.org_id
)

if self.name is None:
Expand Down
15 changes: 7 additions & 8 deletions garak/generators/nvcf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,8 @@ class NvcfChat(Generator):
DEFAULT_PARAMS = Generator.DEFAULT_PARAMS | {
"temperature": 0.2,
"top_p": 0.7,
"fetch_url_format": "https://api.nvcf.nvidia.com/v2/nvcf/pexec/status/",
"invoke_url_base": "https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/",
"extra_nvcf_logging": False,
"status_uri_base": "https://api.nvcf.nvidia.com/v2/nvcf/pexec/status/",
"invoke_uri_base": "https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/",
"timeout": 60,
"version_id": None, # string
"stop_on_404": True,
Expand All @@ -50,10 +49,10 @@ def __init__(self, name=None, config_root=_config):
"Please specify a function identifier in model name (-n)"
)

self.invoke_url = self.invoke_url_base + self.name
self.invoke_uri = self.invoke_uri_base + self.name

if self.version_id is not None:
self.invoke_url += f"/versions/{self.version_id}"
self.invoke_uri += f"/versions/{self.version_id}"

super().__init__(self.name, config_root=config_root)

Expand Down Expand Up @@ -110,7 +109,7 @@ def _call_model(

request_time = time.time()
logging.debug("nvcf : payload %s", repr(payload))
response = session.post(self.invoke_url, headers=self.headers, json=payload)
response = session.post(self.invoke_uri, headers=self.headers, json=payload)

while response.status_code == 202:
if time.time() > request_time + self.timeout:
Expand All @@ -120,8 +119,8 @@ def _call_model(
msg = "Got HTTP 202 but no NVCF-REQID was returned"
logging.info("nvcf : %s", msg)
raise AttributeError(msg)
fetch_url = self.fetch_url_format + request_id
response = session.get(fetch_url, headers=self.headers)
status_uri = self.status_uri_base + request_id
response = session.get(status_uri, headers=self.headers)

if 400 <= response.status_code < 600:
logging.warning("nvcf : returned error code %s", response.status_code)
Expand Down
2 changes: 1 addition & 1 deletion tests/generators/test_nvcf.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def test_version_endpoint(klassname):
_config.plugins.generators["nvcf"][klassname]["api_key"] = "placeholder key"
_config.plugins.generators["nvcf"][klassname]["version_id"] = version
g = _plugins.load_plugin(f"generators.nvcf.{klassname}")
assert g.invoke_url == f"{g.invoke_url_base}{name}/versions/{version}"
assert g.invoke_uri == f"{g.invoke_uri_base}{name}/versions/{version}"


@pytest.mark.parametrize("klassname", PLUGINS)
Expand Down

0 comments on commit b60bd9d

Please sign in to comment.