partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama #27689

ElhamBadri2411 · 2024-10-28T19:35:27Z

Description: The issue concerns the unexpected behavior observed using the bind_tools method in LangChain's ChatOllama. When tools are not bound, the llm.stream() method works as expected, returning incremental chunks of content, which is crucial for real-time applications such as conversational agents and live feedback systems. However, when bind_tools([]) is used, the streaming behavior changes, causing the output to be delivered in full chunks rather than incrementally. This change negatively impacts the user experience by breaking the real-time nature of the streaming mechanism.
Issue: #26971

vercel · 2024-10-28T19:35:31Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Nov 15, 2024 4:30pm

ccurme

Does tool calling still work following this change?

I'm finding issues with tool calling integration tests when running locally (e.g., this test).

Unfortunately we don't run these in CI for Ollama. You can run locally via

cd libs/partners/ollama
poetry run python -m pytest tests/integration_tests/test_chat_models.py

ElhamBadri2411 · 2024-11-01T22:57:33Z

Hey, the tool calling does pass now, we will fix the out of date branches soon!

…to bug/chatollam-token-level-streaming-bindtools-unavailable

ccurme

Thanks for your patience with the review. Only blocking question is on async streaming case.

ccurme · 2024-11-12T20:40:52Z

libs/partners/ollama/langchain_ollama/chat_models.py

@@ -476,15 +476,16 @@ async def _acreate_chat_stream(

        params["options"]["stop"] = stop
        if "tools" in kwargs:
-            yield await self._async_client.chat(
+            async for part in await self._async_client.chat(


Do we need to also fix the async streaming case? We unfortunately don't cover async streaming with tools in our standard tests.

Yeah I believe we did. It was because when testing it manually the async streaming case was not returning chunks, it was just returning one big chunk.

I added a standard test for async tool calling this morning in #28133. You can run it against Ollama via

python -m pytest tests/integration_tests/test_chat_models.py::TestChatOllama::test_tool_calling_async

It appeared broken here due to changing stream to True in the async case for tool calling. I pushed an update to fix the test. Confirmed no new failures across standard tests.

ccurme · 2024-11-12T20:48:39Z

libs/partners/ollama/langchain_ollama/chat_models.py

-                format=params["format"],
-                tools=kwargs["tools"],
-            )
+            if len(kwargs["tools"]) == 0:


(nit) Wondering if there are any options for simplifying this:

change if "tools" in kwargs to if kwargs.get("tools"), which is Falsey for empty list; or

There's a method _should_stream which is called on both .stream and .astream that controls whether we delegate to .invoke. Could potentially override this on ChatOllama. See example here:

langchain/libs/partners/openai/langchain_openai/chat_models/base.py

Lines 982 to 1002 in 00e7b2d

def _should_stream(

self,

*,

async_api: bool,

run_manager: Optional[

Union[CallbackManagerForLLMRun, AsyncCallbackManagerForLLMRun]

] = None,

response_format: Optional[Union[dict, type]] = None,

**kwargs: Any,

) -> bool:

if isinstance(response_format, type) and is_basemodel_subclass(response_format):

# TODO: Add support for streaming with Pydantic response_format.

warnings.warn("Streaming with Pydantic response_format not yet supported.")

return False

if self.model_name is not None and self.model_name.startswith("o1"):

# TODO: Add support for streaming with o1 once supported.

return False

return super()._should_stream(

async_api=async_api, run_manager=run_manager, **kwargs

)

libs/partners/ollama/langchain_ollama/chat_models.py

…ols-unavailable

ccurme

Updated the async case. Let me know if you see any issues.

Thanks!

ElhamBadri2411 added 2 commits October 28, 2024 15:14

bug: chatollama now streams chunks even when tools are bound

8f84d46

chore: formatting

e3396fb

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Oct 28, 2024

ccurme reviewed Oct 30, 2024

View reviewed changes

ccurme self-assigned this Oct 30, 2024

bug: fixed integration test

db2d4c4

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Nov 1, 2024

Merge branch 'master' of https://github.com/langchain-ai/langchain in…

4e51472

…to bug/chatollam-token-level-streaming-bindtools-unavailable

ElhamBadri2411 requested a review from ccurme November 12, 2024 16:59

ccurme reviewed Nov 12, 2024

View reviewed changes

Made _create_chat_stream modifications concise

3e0f8ca

4meyDam1e reviewed Nov 15, 2024

View reviewed changes

libs/partners/ollama/langchain_ollama/chat_models.py Show resolved Hide resolved

ccurme added 2 commits November 15, 2024 11:11

Merge branch 'master' into bug/chatollam-token-level-streaming-bindto…

08b5187

…ols-unavailable

update async tool calling

39e1d7a

ccurme approved these changes Nov 15, 2024

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Nov 15, 2024

ccurme merged commit d696728 into langchain-ai:master Nov 15, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama #27689

partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama #27689

ElhamBadri2411 commented Oct 28, 2024

vercel bot commented Oct 28, 2024 •

edited

Loading

ccurme left a comment

ElhamBadri2411 commented Nov 1, 2024

ccurme left a comment

ccurme Nov 12, 2024

ElhamBadri2411 Nov 13, 2024

ccurme Nov 15, 2024

ccurme Nov 12, 2024

ccurme left a comment

	def _should_stream(
	self,
	*,
	async_api: bool,
	run_manager: Optional[
	Union[CallbackManagerForLLMRun, AsyncCallbackManagerForLLMRun]
	] = None,
	response_format: Optional[Union[dict, type]] = None,
	**kwargs: Any,
	) -> bool:
	if isinstance(response_format, type) and is_basemodel_subclass(response_format):
	# TODO: Add support for streaming with Pydantic response_format.
	warnings.warn("Streaming with Pydantic response_format not yet supported.")
	return False
	if self.model_name is not None and self.model_name.startswith("o1"):
	# TODO: Add support for streaming with o1 once supported.
	return False

	return super()._should_stream(
	async_api=async_api, run_manager=run_manager, **kwargs
	)

partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama #27689

partners/ollama: Enabled Token Level Streaming when Using Bind Tools for ChatOllama #27689

Conversation

ElhamBadri2411 commented Oct 28, 2024

vercel bot commented Oct 28, 2024 • edited Loading

ccurme left a comment

Choose a reason for hiding this comment

ElhamBadri2411 commented Nov 1, 2024

ccurme left a comment

Choose a reason for hiding this comment

ccurme Nov 12, 2024

Choose a reason for hiding this comment

ElhamBadri2411 Nov 13, 2024

Choose a reason for hiding this comment

ccurme Nov 15, 2024

Choose a reason for hiding this comment

ccurme Nov 12, 2024

Choose a reason for hiding this comment

ccurme left a comment

Choose a reason for hiding this comment

vercel bot commented Oct 28, 2024 •

edited

Loading