Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model features: native async #110

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

raspawar
Copy link
Collaborator

Native Async

Validated current implementation and added test cases.

llm = ChatNVIDIA()
message = HumanMessage(content="Hello")
response = await llm.agenerate([[message]])

cc: @sumitkbh

@raspawar raspawar requested a review from mattf October 18, 2024 06:39
@mattf
Copy link
Collaborator

mattf commented Oct 18, 2024

@efriis we have async implementations for generation functions. what is the criteria for "native async" (see https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/#model-features)?

@raspawar raspawar requested a review from mattf October 25, 2024 14:52
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per discussion w/ Erick, we need to implement _agenerate, which will entail a move from requests to httpx

@raspawar raspawar requested a review from mattf November 21, 2024 13:20
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. please add a test that confirms httpx requests are interleaved.



def test_generate(chat_model: str, mode: dict) -> None:
"""Test generate method of anthropic."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anthropic?

@@ -100,7 +103,7 @@ class _NVIDIAClient(BaseModel):
last_inputs: Optional[dict] = Field(
default={}, description="Last inputs sent over to the server"
)
last_response: Response = Field(
last_response: Optional[Response] = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why make this optional?

assert chat_messages == messages_copy


# @pytest.mark.scheduled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be commented or not?



# @pytest.mark.scheduled
async def test_async_generate(chat_model: str, mode: dict) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will pass even if agenerate() is implemented without truly async communication w/ the server.

add a unit text that check that async generation requests are interleaved. for inspiration...

    async def afetch_data(url: str) -> str:
        async with httpx.AsyncClient() as client:
            return (await client.get(url)).text
    async def amock_response(request):
        await asyncio.sleep(1)
        return httpx.Response(200, text="Hello world!")

    start_time = time.time()

    httpx_mock.add_callback(amock_response, is_reusable=True)
    task1, task2 = afetch_data("http://example.com"), afetch_data("http://example.com")
    _, _ = await asyncio.gather(task1, task2)

    assert (time.time() - start_time) < 2, "Tasks did not run concurrently"

@raspawar raspawar requested a review from mattf December 12, 2024 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants