LangChain Integration #60

falquaddoomi · 2024-09-25T19:52:45Z

This (work-in-progress) PR changes the GPT3CompletionModel from using the openai package directly for communicating with the OpenAI API to using langchain-openai, which wraps the openai package.

Tests have been updated and should work with LangChain. Executing pytest --runcost will actually query the OpenAI API, so that should be considered a good test that the changeover to the new package is working.

There's still many things missing (e.g.. mapping all the openai params to LangChain equivalents), which is why this PR is a draft, but I thought I'd push it early and get comments as we incrementally make the change to LangChain.

…ngly

miltondp

I left some comments. In case it's useful, here there is some code I used once when dealing with the LangChain API.

miltondp · 2024-09-26T02:29:29Z

libs/manubot_ai_editor/models.py

+        if self.endpoint == "edits":
+            # FIXME: what's the "edits" equivalent in langchain?
+            client_cls = OpenAI
+        elif self.endpoint == "chat":
+            client_cls = ChatOpenAI
+        else:
+            client_cls = OpenAI


I don't think we need to take care of this anymore. Before, there were a "completion" and "edits" endpoints, but now we only have a "chat" endpoint I believe. Let's research a little bit, but I think we only need the ChatOpenAI class here.

miltondp · 2024-09-26T02:31:15Z

libs/manubot_ai_editor/models.py

+                # FIXME: 'params' contains a lot of fields that we're not
+                #  currently passing to the langchain client. i need to figure
+                #  out where they're supposed to be given, e.g. in the client
+                #  init or with each request.


What are those fields in params?

Looking at it again, "a lot" is an overstatement, sorry. On top of the model_parameters dict that gets merged into it and aside from prompt (or the other variants based on whether it's a "chat" or "edits" model) GPT3CompletionModel.get_params() introduces just:

n: I assume this is the number of responses you want the API to generate

it seems that it's always 1, and it LangChain's invoke() returns a single response anyway, so I assume we can ignore this one

stop: despite being None all the time and probably not necessary to include in invoke()

this one's easy to integrate, since invoke() takes stop as an argument; I'll just go ahead and add it

max_tokens: it seems this is taken at client initialization in LangChain

I'll see if there's a way to provide it for each invoke() call, or to change its value prior to the call

Correct me if I'm wrong, but since model_parameters is already used to initialize the client and since AFAICT it's not changed after that, I don't think we need to include its contents in invoke().

I'll go ahead and make the other changes, though.

If I didn't forget what the code does, the only field that should go in each request/invoke (instead of using them to initialize the client) is max_tokens, because for each paragraph we restrict the model to generate up to twice (or so) the number of tokens in the input paragraph. So that should go into each request, not the client (or update the client before each request?).

Right, after I made the comment above I discovered that invoke() does take max_tokens as well as stop; I've added it in my most recent commits. I assume we still don't need to change n from 1, which AFAICT is the default for invoke() as well, so I left that out of the call to invoke().

setup.py

…oke()

…ectly in the code

d33bs

Nice work! I left a few comments where I thought improvements could be made. I'm less familiar with how this might operate in the context of other code in the project so my comments might miss the mark. If there's a more specific area of focus I can give just let me know; happy to give things another look.

In addition to the individual comments I wondered: "how does pytest --runcost work?" (mentioned in the PR description). Consider adding this to the documentation somewhere, perhaps in the readme or another spot.

d33bs · 2024-10-15T13:28:03Z

libs/manubot_ai_editor/models.py

@@ -253,6 +255,22 @@ def __init__(

        self.several_spaces_pattern = re.compile(r"\s+")

+        if self.endpoint == "edits":
+            # FIXME: what's the "edits" equivalent in langchain?


Consider moving this FIXME to a GitHub issue (if it's not already) which is more actionable and may be less prone to being forgotten. This comment also applies to other locations where this pattern is found.

Good point, and fair that adding "FIXME"s at all runs the risk of them being introduced into merged code. My intent here was to get this FIXME figured out within the scope of this PR, which is why I didn't create an issue for it, but I'll think more on not adding FIXMEs and instead communicating questions some other way (review comments, perhaps?)

d33bs · 2024-10-15T13:34:18Z

libs/manubot_ai_editor/models.py

@@ -253,6 +255,22 @@ def __init__(

        self.several_spaces_pattern = re.compile(r"\s+")

+        if self.endpoint == "edits":


Consider documenting class attributes in the docstring for the class to help identify what functionality they're associated with. As I read through this I wondered "what does self.endpoint do; how might it matter later?" and couldn't find much human-readable form on this topic. It could be that I'm missing fundamental common knowledge about how this works - if so, please don't hesitate to link to the appropriate location.

Hey, thanks for pointing this out; I've created an issue to address filling these gaps in the documentation, #68.

d33bs · 2024-10-15T13:42:52Z

libs/manubot_ai_editor/models.py

-                    completions = openai.Completion.create(**params)
+                # map the prompt to langchain's prompt types, based on what
+                # kind of endpoint we're using
+                if "messages" in params:


A bit outside the PR scope but adding as this is a fresh read of the code and I'm less familiar with how params are used. I noticed the docstring doesn't match the method parameters. Consider updating this when there's a chance.

I'm thinking we'll do a comprehensive review of the docstrings for the PR that addresses issue #68, but in this PR I've attempted to add some documentation to the GPT3CompletionModel.get_params() method to address this gap.

d33bs · 2024-10-15T13:52:59Z

libs/manubot_ai_editor/models.py

+                    # based on the 'role' field
+                    prompt = [
+                        HumanMessage(content=msg["content"])
+                        if msg["role"] == "user" else


This might need formatting corrections applied via Black (I tested using the existing .pre-commit-config.yaml).

libs/manubot_ai_editor/models.py

…weaks

falquaddoomi added 3 commits September 25, 2024 11:54

Replaced openai dependency with langchain_openai

de34033

For posterity, updates to latest release openai, updates code accordi…

ae19171

…ngly

Switches from openai to langchain_openai, updates tests accordingly.

33d0adc

falquaddoomi changed the title ~~Langchain Integration~~ LangChain Integration Sep 25, 2024

miltondp reviewed Sep 26, 2024

View reviewed changes

falquaddoomi added 3 commits September 26, 2024 08:39

Passed the remaining important(?) params, max_tokens and stop, to inv…

8e75813

…oke()

Changed langchain-openai dep to 0.2.x, so we get patch releases

e4da6f0

Added direct dependency on langchain-core~=0.3.6, since we use it dir…

cdca3b8

…ectly in the code

falquaddoomi requested review from vincerubinetti and d33bs October 9, 2024 20:16

d33bs reviewed Oct 16, 2024

View reviewed changes

Ran black on models.py as suggested

9a2d11b

falquaddoomi mentioned this pull request Nov 12, 2024

Use code coverage to identify, address testing gaps #65

Open

falquaddoomi marked this pull request as ready for review November 20, 2024 20:35

falquaddoomi mentioned this pull request Nov 20, 2024

Correct gaps, expand documentation #68

Open

falquaddoomi added 4 commits November 20, 2024 13:40

Merge branch 'main' into langchain-integration

3d52297

Removes defunct 'edits' endpoint code, tests, and mentions in docs.

bfacb16

Adds docstring to get_params(), other small comment and consistency t…

3bafcf6

…weaks

Formatted code w/black

a98f69a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LangChain Integration #60

LangChain Integration #60

falquaddoomi commented Sep 25, 2024

miltondp left a comment

miltondp Sep 26, 2024

miltondp Sep 26, 2024

falquaddoomi Sep 26, 2024

miltondp Sep 26, 2024

falquaddoomi Sep 26, 2024

d33bs left a comment

d33bs Oct 15, 2024

falquaddoomi Oct 23, 2024 •

edited

Loading

d33bs Oct 15, 2024

falquaddoomi Nov 20, 2024

d33bs Oct 15, 2024

falquaddoomi Nov 20, 2024

d33bs Oct 15, 2024 •

edited

Loading

		@@ -253,6 +255,22 @@ def __init__(

		self.several_spaces_pattern = re.compile(r"\s+")

		if self.endpoint == "edits":

LangChain Integration #60

Are you sure you want to change the base?

LangChain Integration #60

Conversation

falquaddoomi commented Sep 25, 2024

miltondp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d33bs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

falquaddoomi Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d33bs Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

falquaddoomi Oct 23, 2024 •

edited

Loading

d33bs Oct 15, 2024 •

edited

Loading