From 0aeb643c3e00d078d400b046da8f7d900c86543b Mon Sep 17 00:00:00 2001 From: Shreya Shankar Date: Sat, 12 Oct 2024 18:08:46 -0400 Subject: [PATCH] feat: add reduce operation lineage --- docs/operators/reduce.md | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/docs/operators/reduce.md b/docs/operators/reduce.md index fd8d0b80..3c01fd82 100644 --- a/docs/operators/reduce.md +++ b/docs/operators/reduce.md @@ -51,7 +51,7 @@ This Reduce operation processes customer feedback grouped by department: | Parameter | Description | Default | | ------------------------- | ------------------------------------------------------------------------------------------------------ | --------------------------- | -| `sample` | Number of samples to use for the operation | None | +| `sample` | Number of samples to use for the operation | None | | `synthesize_resolve` | If false, won't synthesize a resolve operation between map and reduce | true | | `model` | The language model to use | Falls back to default_model | | `input` | Specifies the schema or keys to subselect from each item | All keys from input items | @@ -196,6 +196,28 @@ For semantic similarity sampling, you can use a query to select the most relevan In this example, the Reduce operation will use semantic similarity to select the 30 reviews most relevant to battery life and performance for each product_id. This allows you to focus the summarization on specific aspects of the product reviews. +### Lineage + +The Reduce operation supports lineage, which allows you to track the original input data for each output. This can be useful for debugging and auditing. To enable lineage, add a `lineage` configuration to your reduce operation, specifying the keys to include in the lineage. For example: + +```yaml +- name: summarize_reviews_by_category + type: reduce + reduce_key: category + prompt: | + Summarize the reviews for category {{ inputs[0].category }}: + {% for item in inputs %} + Review {{ loop.index }}: {{ item.review }} + {% endfor %} + output: + schema: + summary: string + lineage: + - product_id +``` + +This output will include a list of all product_ids for each category in the lineage, saved under the key `summarize_reviews_by_category_lineage`. + ## Best Practices 1. **Choose Appropriate Keys**: Select `reduce_key`(s) that logically group your data for the desired aggregation.