Skip to content

Commit

Permalink
Strip code block fences and extra test from xml when doing streaming … (
Browse files Browse the repository at this point in the history
#15293)

…parse

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
  • Loading branch information
nfcampos authored Dec 29, 2023
1 parent ec72225 commit 36ceffd
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 4 deletions.
23 changes: 20 additions & 3 deletions libs/core/langchain_core/output_parsers/xml.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,32 @@ def parse(self, text: str) -> Dict[str, List[Any]]:
def _transform(
self, input: Iterator[Union[str, BaseMessage]]
) -> Iterator[AddableDict]:
xml_start_re = re.compile(r"<[a-zA-Z:_]")
parser = ET.XMLPullParser(["start", "end"])
xml_started = False
current_path: List[str] = []
current_path_has_children = False
buffer = ""
for chunk in input:
if isinstance(chunk, BaseMessage):
# extract text
chunk_content = chunk.content
if not isinstance(chunk_content, str):
continue
chunk = chunk_content
# pass chunk to parser
parser.feed(chunk)
# add chunk to buffer of unprocessed text
buffer += chunk
# if xml string hasn't started yet, continue to next chunk
if not xml_started:
if match := xml_start_re.search(buffer):
# if xml string has started, remove all text before it
buffer = buffer[match.start() :]
xml_started = True
else:
continue
# feed buffer to parser
parser.feed(buffer)
buffer = ""
# yield all events
for event, elem in parser.read_events():
if event == "start":
Expand All @@ -80,7 +94,10 @@ def _transform(
if not current_path_has_children:
yield nested_element(current_path, elem)
# prevent yielding of parent element
current_path_has_children = True
if current_path:
current_path_has_children = True
else:
xml_started = False
# close parser
parser.close()

Expand Down
17 changes: 16 additions & 1 deletion libs/core/tests/unit_tests/output_parsers/test_xml_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,22 @@

@pytest.mark.parametrize(
"result",
[DEF_RESULT_ENCODING, DEF_RESULT_ENCODING[DEF_RESULT_ENCODING.find("\n") :]],
[
DEF_RESULT_ENCODING,
DEF_RESULT_ENCODING[DEF_RESULT_ENCODING.find("\n") :],
f"""
```xml
{DEF_RESULT_ENCODING}
```
""",
f"""
Some random text
```xml
{DEF_RESULT_ENCODING}
```
More random text
""",
],
)
def test_xml_output_parser(result: str) -> None:
"""Test XMLOutputParser."""
Expand Down

0 comments on commit 36ceffd

Please sign in to comment.