`MetadataBuilder` #5702

sjrl · 2023-09-01T10:56:10Z

See the proposal: #5540 and see feature request for Haystack v1

LLMs clients output strings, but many components expect other object types, and LLMs may produce output in a parsable format that can be directly converted into objects. Output parsers transform these strings into objects of the user’s choosing.

MetadataBuilder. It takes the string replies and inserts them as metadata into the Documents that were originally passed to the LLM. I'm open to renaming this one, since the goal would be to output Documents with inserted metadata.

For example, a PromptNode could be used to summarize a longer doc and the user would like to have the result inserted as metadata for that Document. There it would allow us to easily add category tags, sentiment, summaries (...) to docs that can be utilized later at query time (e.g. to filter down the search space efficiently or utilize the metadata for online retrieval/generation steps)

The text was updated successfully, but these errors were encountered:

anakin87 · 2024-01-15T17:11:53Z

More information on the expected use cases and component I/O can be found here.

In general, it is probably best to focus on developing this component once looping and input lists are handleable by the Pipelines.
(Otherwise, we would be going to build a component that is effectively unusable in the Pipelines.)

julian-risch · 2024-06-28T10:58:55Z

@sjrl We are considering this issue for our next sprint. Is there any new info that will be relevant for the implementation of this component?

anakin87 · 2024-06-28T11:04:15Z

This is probably relevant: https://www.notion.so/deepsetai/Advanced-Use-Case-Automatic-Metadata-Enrichment-8fdfc56e82434459963beaa7a9dc5069

sjrl · 2024-06-28T11:10:48Z

Hey @julian-risch thanks for reaching out! No new info on my end. I think the work @davidsbatista did that @anakin87 linked is exactly the type of use case we are thinking about. In general metadata enrichment of files to help with retrieval through filters, embed meta fields, etc. Also possibly for downstream applications (e.g. they want to show a summary along side a retrieved file). I'd be particularly interested in a set up that would allow me to automatically extract things like title, authors, publication date, etc. from PDF files and then save that as metadata with the file.

davidsbatista · 2024-07-03T10:03:22Z

see #5700 - it's related/duplicated

julian-risch · 2024-09-09T06:13:14Z

@davidsbatista Could you please check again if we can merge the two issues #5700 and #5702 or whether they should remain separate?

davidsbatista · 2024-09-13T07:53:04Z

After discussing it with Sebastian, these two issues should be merged.

anakin87 · 2024-10-28T16:02:37Z

This should have been done in deepset-ai/haystack-experimental#92.
If not, feel free to reopen this issue.

sjrl added the 2.x Related to Haystack v2.0 label Sep 1, 2023

sjrl mentioned this issue Sep 1, 2023

LLM support (2.x) #5330

Closed

Timoeller added the P2 Medium priority, add to the next sprint if no P1 available label Oct 12, 2023

mathislucka added the type:feature New feature or request label Dec 22, 2023

vrunm mentioned this issue Dec 22, 2023

feat: Add MetadataBuilder #6636

Closed

masci added this to the 2.0.0 milestone Jan 8, 2024

masci assigned anakin87 Jan 8, 2024

anakin87 removed their assignment Jan 22, 2024

anakin87 mentioned this issue Feb 16, 2024

Output shaper to add the output of the LLM to the document's metadata #4926

Closed

masci modified the milestones: 2.0.0, 2.1.0 Feb 23, 2024

masci added P3 Low priority, leave it in the backlog and removed P2 Medium priority, add to the next sprint if no P1 available labels Feb 23, 2024

masci removed this from the 2.1.0 milestone Apr 1, 2024

mrm1001 added P2 Medium priority, add to the next sprint if no P1 available and removed P3 Low priority, leave it in the backlog labels Jun 28, 2024

julian-risch assigned shadeMe Jun 28, 2024

davidsbatista mentioned this issue Jul 1, 2024

DocumentsBuilder #5700

Closed

julian-risch assigned julian-risch and unassigned shadeMe Aug 5, 2024

julian-risch added P3 Low priority, leave it in the backlog and removed P2 Medium priority, add to the next sprint if no P1 available labels Aug 19, 2024

julian-risch removed their assignment Aug 30, 2024

davidsbatista self-assigned this Sep 13, 2024

davidsbatista mentioned this issue Sep 13, 2024

feat: metadata extractor based on a LLM deepset-ai/haystack-experimental#92

Merged

anakin87 closed this as completed Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MetadataBuilder` #5702

`MetadataBuilder` #5702

sjrl commented Sep 1, 2023

anakin87 commented Jan 15, 2024

julian-risch commented Jun 28, 2024

anakin87 commented Jun 28, 2024

sjrl commented Jun 28, 2024

davidsbatista commented Jul 3, 2024

julian-risch commented Sep 9, 2024

davidsbatista commented Sep 13, 2024

anakin87 commented Oct 28, 2024

MetadataBuilder #5702

MetadataBuilder #5702

Comments

sjrl commented Sep 1, 2023

anakin87 commented Jan 15, 2024

julian-risch commented Jun 28, 2024

anakin87 commented Jun 28, 2024

sjrl commented Jun 28, 2024

davidsbatista commented Jul 3, 2024

julian-risch commented Sep 9, 2024

davidsbatista commented Sep 13, 2024

anakin87 commented Oct 28, 2024

`MetadataBuilder` #5702

`MetadataBuilder` #5702