Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama #27689

Conversation

ElhamBadri2411
Copy link
Contributor

Description: The issue concerns the unexpected behavior observed using the bind_tools method in LangChain's ChatOllama. When tools are not bound, the llm.stream() method works as expected, returning incremental chunks of content, which is crucial for real-time applications such as conversational agents and live feedback systems. However, when bind_tools([]) is used, the streaming behavior changes, causing the output to be delivered in full chunks rather than incrementally. This change negatively impacts the user experience by breaking the real-time nature of the streaming mechanism.
Issue: #26971

Copy link

vercel bot commented Oct 28, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Nov 15, 2024 4:30pm

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Oct 28, 2024
Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does tool calling still work following this change?

I'm finding issues with tool calling integration tests when running locally (e.g., this test).

Unfortunately we don't run these in CI for Ollama. You can run locally via

cd libs/partners/ollama
poetry run python -m pytest tests/integration_tests/test_chat_models.py

@ccurme ccurme self-assigned this Oct 30, 2024
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Nov 1, 2024
@ElhamBadri2411
Copy link
Contributor Author

Hey, the tool calling does pass now, we will fix the out of date branches soon!

…to bug/chatollam-token-level-streaming-bindtools-unavailable
Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience with the review. Only blocking question is on async streaming case.

@@ -476,15 +476,16 @@ async def _acreate_chat_stream(

params["options"]["stop"] = stop
if "tools" in kwargs:
yield await self._async_client.chat(
async for part in await self._async_client.chat(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to also fix the async streaming case? We unfortunately don't cover async streaming with tools in our standard tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I believe we did. It was because when testing it manually the async streaming case was not returning chunks, it was just returning one big chunk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a standard test for async tool calling this morning in #28133. You can run it against Ollama via

python -m pytest tests/integration_tests/test_chat_models.py::TestChatOllama::test_tool_calling_async

It appeared broken here due to changing stream to True in the async case for tool calling. I pushed an update to fix the test. Confirmed no new failures across standard tests.

format=params["format"],
tools=kwargs["tools"],
)
if len(kwargs["tools"]) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Wondering if there are any options for simplifying this:

  • change if "tools" in kwargs to if kwargs.get("tools"), which is Falsey for empty list; or
  • There's a method _should_stream which is called on both .stream and .astream that controls whether we delegate to .invoke. Could potentially override this on ChatOllama. See example here:
    def _should_stream(
    self,
    *,
    async_api: bool,
    run_manager: Optional[
    Union[CallbackManagerForLLMRun, AsyncCallbackManagerForLLMRun]
    ] = None,
    response_format: Optional[Union[dict, type]] = None,
    **kwargs: Any,
    ) -> bool:
    if isinstance(response_format, type) and is_basemodel_subclass(response_format):
    # TODO: Add support for streaming with Pydantic response_format.
    warnings.warn("Streaming with Pydantic response_format not yet supported.")
    return False
    if self.model_name is not None and self.model_name.startswith("o1"):
    # TODO: Add support for streaming with o1 once supported.
    return False
    return super()._should_stream(
    async_api=async_api, run_manager=run_manager, **kwargs
    )

Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the async case. Let me know if you see any issues.

Thanks!

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Nov 15, 2024
@ccurme ccurme merged commit d696728 into langchain-ai:master Nov 15, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature lgtm PR looks good. Use to confirm that a PR is ready for merging. size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants