Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add StringJoiner as a convenience component #8353

Closed
sjrl opened this issue Sep 11, 2024 · 7 comments · Fixed by #8357
Closed

feat: Add StringJoiner as a convenience component #8353

sjrl opened this issue Sep 11, 2024 · 7 comments · Fixed by #8357
Assignees
Labels
P1 High priority, add to the next sprint

Comments

@sjrl
Copy link
Contributor

sjrl commented Sep 11, 2024

Is your feature request related to a problem? Please describe.
As @ju-gu and I have been building pipelines with many branches that branch into many different Prompt Builders we ran into a need for a component like StringJoiner which would concatenate strings into a list of strings. In our specific use case our workflow looks like (except imagine six branches):

ConditionalRouter --> PromptBuilder 1 --> Generator --> AnswerBuilder --> AnswerJoiner
|-> PromptBuilder 2 --> Generator --> AnswerBuilder -|

Using the new StringJoiner we would reduce the number of Generator + AnswerBuilder Components we would need to:

ConditionalRouter --> PromptBuilder 1 --> StringJoiner --> OutputAdpater --> Generator --> AnswerBuilder
|-> PromptBuilder 2 -|

Describe the solution you'd like
Check with the team that adding this component would be okay. If so I'll go ahead and make a PR.

from typing import Dict, List

from haystack import component, logging
from haystack.core.component.types import Variadic

logger = logging.getLogger(__name__)


@component
class StringJoiner:
    """
    Component to join strings from different components to a list of strings
    """
    @component.output_types(strings=List[str])
    def run(self, strings: Variadic[str]) -> Dict[str, List[str]]:
        """
        Joins strings into a list of strings
        :param strings: strings from different components

        """
        strings = list(strings)
        return {"strings": strings}
@sjrl
Copy link
Contributor Author

sjrl commented Sep 12, 2024

Hey @silvanocerza I made a draft PR with the component here. However, I noticed in testing that the component doesn't actually work in a Pipeline. Specifically it fails this test

    @pytest.mark.integration
    def test_with_pipeline(self):
        string_1 = "What's Natural Language Processing?"
        string_2 = "What's is life?"

        pipe = Pipeline()
        pipe.add_component("prompt_builder_1", PromptBuilder("Builder 1: {{query}}"))
        pipe.add_component("prompt_builder_2", PromptBuilder("Builder 2: {{query}}"))
        pipe.add_component("joiner", StringJoiner())

        pipe.connect("prompt_builder_1.prompt", "joiner.strings")
        pipe.connect("prompt_builder_2.prompt", "joiner.strings")

        results = pipe.run(data={"prompt_builder_1": {"query": string_1}, "prompt_builder_2": {"query": string_2}})

        assert "joiner" in results
        assert len(results["joiner"]["strings"]) == 2  # <-- This Fails. List only has one item in it (the string_2).
        assert results["joiner"]["strings"] == [
            "Builder 1: What's Natural Language Processing?",
            "Builder 2: What's is life?",
        ]

so I wanted to ask if you happen to know whether the Variadic input type in Haystack doesn't work with type str. So Variadic[str]?

@julian-risch julian-risch added the P1 High priority, add to the next sprint label Sep 16, 2024
@davidsbatista
Copy link
Contributor

@sjrl not sure about this but looking at the type definition of Variadic it seems it doesn't support str

here

HAYSTACK_VARIADIC_ANNOTATION = "__haystack__variadic_t"

# # Generic type variable used in the Variadic container
T = TypeVar("T")


# Variadic is a custom annotation type we use to mark input types.
# This type doesn't do anything else than "marking" the contained
# type so it can be used in the `InputSocket` creation where we
# check that its annotation equals to CANALS_VARIADIC_ANNOTATION
Variadic: TypeAlias = Annotated[Iterable[T], HAYSTACK_VARIADIC_ANNOTATION]

@sjrl
Copy link
Contributor Author

sjrl commented Sep 19, 2024

@davidsbatista thanks for the info. When I talked to @silvanocerza offline he thought this should work. @silvanocerza have you had a chance to look at this yet?

@davidsbatista
Copy link
Contributor

@sjrl have a look at this sample/testing code:

https://github.com/deepset-ai/haystack/blob/main/haystack/testing/sample_components/joiner.py

I think what you want might already be there

@davidsbatista
Copy link
Contributor

I've just tried the StringJoiner from the link I sent you get the same error as with your code

@davidsbatista
Copy link
Contributor

@sjrl @silvanocerza good news - I think "I found the bug", running the StringJoin on the subgrah branch works, you get the expected output:

{'joiner': {'output': "Builder 1: What's Natural Language Processing? Builder 2: What's is life?"}}

@sjrl
Copy link
Contributor Author

sjrl commented Sep 23, 2024

Oh interesting, I didn't realize the subgraph branch also changes how the non-cyclic piplines work as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority, add to the next sprint
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants