Skip to content

Commit

Permalink
docs: add links to concept guides in how-tos (#28118)
Browse files Browse the repository at this point in the history
  • Loading branch information
ccurme authored Nov 15, 2024
1 parent ef2dc9e commit 74438f3
Show file tree
Hide file tree
Showing 79 changed files with 101 additions and 100 deletions.
2 changes: 1 addition & 1 deletion docs/docs/how_to/HTML_header_metadata_splitter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"# How to split by HTML header \n",
"## Description and motivation\n",
"\n",
"[HTMLHeaderTextSplitter](https://python.langchain.com/api_reference/text_splitters/html/langchain_text_splitters.html.HTMLHeaderTextSplitter.html) is a \"structure-aware\" chunker that splits text at the HTML element level and adds metadata for each header \"relevant\" to any given chunk. It can return chunks element by element or combine elements with the same metadata, with the objectives of (a) keeping related text grouped (more or less) semantically and (b) preserving context-rich information encoded in document structures. It can be used with other text splitters as part of a chunking pipeline.\n",
"[HTMLHeaderTextSplitter](https://python.langchain.com/api_reference/text_splitters/html/langchain_text_splitters.html.HTMLHeaderTextSplitter.html) is a \"structure-aware\" [text splitter](/docs/concepts/text_splitters/) that splits text at the HTML element level and adds metadata for each header \"relevant\" to any given chunk. It can return chunks element by element or combine elements with the same metadata, with the objectives of (a) keeping related text grouped (more or less) semantically and (b) preserving context-rich information encoded in document structures. It can be used with other text splitters as part of a chunking pipeline.\n",
"\n",
"It is analogous to the [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter) for markdown files.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/HTML_section_aware_splitter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"source": [
"# How to split by HTML sections\n",
"## Description and motivation\n",
"Similar in concept to the [HTMLHeaderTextSplitter](/docs/how_to/HTML_header_metadata_splitter), the `HTMLSectionSplitter` is a \"structure-aware\" chunker that splits text at the element level and adds metadata for each header \"relevant\" to any given chunk.\n",
"Similar in concept to the [HTMLHeaderTextSplitter](/docs/how_to/HTML_header_metadata_splitter), the `HTMLSectionSplitter` is a \"structure-aware\" [text splitter](/docs/concepts/text_splitters/) that splits text at the element level and adds metadata for each header \"relevant\" to any given chunk.\n",
"\n",
"It can return chunks element by element or combine elements with the same metadata, with the objectives of (a) keeping related text grouped (more or less) semantically and (b) preserving context-rich information encoded in document structures.\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/how_to/MultiQueryRetriever.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# How to use the MultiQueryRetriever\n",
"\n",
"Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on a distance metric. But, retrieval may produce different results with subtle changes in query wording, or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.\n",
"Distance-based [vector database](/docs/concepts/vectorstores/) retrieval [embeds](/docs/concepts/embedding_models/) (represents) queries in high-dimensional space and finds similar embedded documents based on a distance metric. But, retrieval may produce different results with subtle changes in query wording, or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.\n",
"\n",
"The [MultiQueryRetriever](https://python.langchain.com/api_reference/langchain/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html) automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the `MultiQueryRetriever` can mitigate some of the limitations of the distance-based retrieval and get a richer set of results.\n",
"\n",
Expand Down Expand Up @@ -151,7 +151,7 @@
"id": "7e170263-facd-4065-bb68-d11fb9123a45",
"metadata": {},
"source": [
"Note that the underlying queries generated by the retriever are logged at the `INFO` level."
"Note that the underlying queries generated by the [retriever](/docs/concepts/retrievers/) are logged at the `INFO` level."
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/how_to/add_scores_retriever.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
"source": [
"# How to add scores to retriever results\n",
"\n",
"Retrievers will return sequences of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects, which by default include no information about the process that retrieved them (e.g., a similarity score against a query). Here we demonstrate how to add retrieval scores to the `.metadata` of documents:\n",
"[Retrievers](/docs/concepts/retrievers/) will return sequences of [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects, which by default include no information about the process that retrieved them (e.g., a similarity score against a query). Here we demonstrate how to add retrieval scores to the `.metadata` of documents:\n",
"1. From [vectorstore retrievers](/docs/how_to/vectorstore_retriever);\n",
"2. From higher-order LangChain retrievers, such as [SelfQueryRetriever](/docs/how_to/self_query) or [MultiVectorRetriever](/docs/how_to/multi_vector).\n",
"\n",
"For (1), we will implement a short wrapper function around the corresponding vector store. For (2), we will update a method of the corresponding class.\n",
"For (1), we will implement a short wrapper function around the corresponding [vector store](/docs/concepts/vectorstores/). For (2), we will update a method of the corresponding class.\n",
"\n",
"## Create vector store\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/agent_executor.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
":::\n",
"\n",
"By themselves, language models can't take actions - they just output text.\n",
"A big use case for LangChain is creating **agents**.\n",
"A big use case for LangChain is creating **[agents](/docs/concepts/agents/)**.\n",
"Agents are systems that use an LLM as a reasoning engine to determine which actions to take and what the inputs to those actions should be.\n",
"The results of those actions can then be fed back into the agent and it determines whether more actions are needed, or whether it is okay to finish.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/caching_embeddings.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# Caching\n",
"\n",
"Embeddings can be stored or temporarily cached to avoid needing to recompute them.\n",
"[Embeddings](/docs/concepts/embedding_models/) can be stored or temporarily cached to avoid needing to recompute them.\n",
"\n",
"Caching embeddings can be done using a `CacheBackedEmbeddings`. The cache backed embedder is a wrapper around an embedder that caches\n",
"embeddings in a key-value store. The text is hashed and the hash is used as the key in the cache.\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/character_text_splitter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"source": [
"# How to split by character\n",
"\n",
"This is the simplest method. This splits based on a given character sequence, which defaults to `\"\\n\\n\"`. Chunk length is measured by number of characters.\n",
"This is the simplest method. This [splits](/docs/concepts/text_splitters/) based on a given character sequence, which defaults to `\"\\n\\n\"`. Chunk length is measured by number of characters.\n",
"\n",
"1. How the text is split: by single character separator.\n",
"2. How the chunk size is measured: by number of characters.\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/chat_model_caching.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"\n",
":::\n",
"\n",
"LangChain provides an optional caching layer for chat models. This is useful for two main reasons:\n",
"LangChain provides an optional caching layer for [chat models](/docs/concepts/chat_models). This is useful for two main reasons:\n",
"\n",
"- It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times. This is especially useful during app development.\n",
"- It can speed up your application by reducing the number of API calls you make to the LLM provider.\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/how_to/chat_models_universal_init.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@
"source": [
"# How to init any model in one line\n",
"\n",
"Many LLM applications let end users specify what model provider and model they want the application to be powered by. This requires writing some logic to initialize different ChatModels based on some user configuration. The `init_chat_model()` helper method makes it easy to initialize a number of different model integrations without having to worry about import paths and class names.\n",
"Many LLM applications let end users specify what model provider and model they want the application to be powered by. This requires writing some logic to initialize different [chat models](/docs/concepts/chat_models/) based on some user configuration. The `init_chat_model()` helper method makes it easy to initialize a number of different model integrations without having to worry about import paths and class names.\n",
"\n",
":::tip Supported models\n",
"\n",
"See the [init_chat_model()](https://python.langchain.com/api_reference/langchain/chat_models/langchain.chat_models.base.init_chat_model.html) API reference for a full list of supported integrations.\n",
"\n",
"Make sure you have the integration packages installed for any model providers you want to support. E.g. you should have `langchain-openai` installed to init an OpenAI model.\n",
"Make sure you have the [integration packages](/docs/integrations/chat/) installed for any model providers you want to support. E.g. you should have `langchain-openai` installed to init an OpenAI model.\n",
"\n",
":::"
]
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/chat_token_usage_tracking.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"\n",
":::\n",
"\n",
"Tracking token usage to calculate cost is an important part of putting your app in production. This guide goes over how to obtain this information from your LangChain model calls.\n",
"Tracking [token](/docs/concepts/tokens/) usage to calculate cost is an important part of putting your app in production. This guide goes over how to obtain this information from your LangChain model calls.\n",
"\n",
"This guide requires `langchain-anthropic` and `langchain-openai >= 0.1.9`."
]
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/how_to/chatbots_retrieval.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"source": [
"# How to add retrieval to chatbots\n",
"\n",
"Retrieval is a common technique chatbots use to augment their responses with data outside a chat model's training data. This section will cover how to implement retrieval in the context of chatbots, but it's worth noting that retrieval is a very subtle and deep topic - we encourage you to explore [other parts of the documentation](/docs/how_to#qa-with-rag) that go into greater depth!\n",
"[Retrieval](/docs/concepts/retrieval/) is a common technique chatbots use to augment their responses with data outside a chat model's training data. This section will cover how to implement retrieval in the context of chatbots, but it's worth noting that retrieval is a very subtle and deep topic - we encourage you to explore [other parts of the documentation](/docs/how_to#qa-with-rag) that go into greater depth!\n",
"\n",
"## Setup\n",
"\n",
Expand Down Expand Up @@ -80,7 +80,7 @@
"source": [
"## Creating a retriever\n",
"\n",
"We'll use [the LangSmith documentation](https://docs.smith.langchain.com/overview) as source material and store the content in a vectorstore for later retrieval. Note that this example will gloss over some of the specifics around parsing and storing a data source - you can see more [in-depth documentation on creating retrieval systems here](/docs/how_to#qa-with-rag).\n",
"We'll use [the LangSmith documentation](https://docs.smith.langchain.com/overview) as source material and store the content in a [vector store](/docs/concepts/vectorstores/) for later retrieval. Note that this example will gloss over some of the specifics around parsing and storing a data source - you can see more [in-depth documentation on creating retrieval systems here](/docs/how_to#qa-with-rag).\n",
"\n",
"Let's use a document loader to pull text from the docs:"
]
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/how_to/chatbots_tools.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"metadata": {},
"outputs": [
{
"name": "stdin",
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key: ········\n",
Expand Down Expand Up @@ -78,7 +78,7 @@
"\n",
"Our end goal is to create an agent that can respond conversationally to user questions while looking up information as needed.\n",
"\n",
"First, let's initialize Tavily and an OpenAI chat model capable of tool calling:"
"First, let's initialize Tavily and an OpenAI [chat model](/docs/concepts/chat_models/) capable of tool calling:"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/code_splitter.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# How to split code\n",
"\n",
"[RecursiveCharacterTextSplitter](https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html) includes pre-built lists of separators that are useful for splitting text in a specific programming language.\n",
"[RecursiveCharacterTextSplitter](https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html) includes pre-built lists of separators that are useful for [splitting text](/docs/concepts/text_splitters/) in a specific programming language.\n",
"\n",
"Supported languages are stored in the `langchain_text_splitters.Language` enum. They include:\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/how_to/contextual_compression.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@
"source": [
"# How to do retrieval with contextual compression\n",
"\n",
"One challenge with retrieval is that usually you don't know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.\n",
"One challenge with [retrieval](/docs/concepts/retrieval/) is that usually you don't know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.\n",
"\n",
"Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing” here refers to both compressing the contents of an individual document and filtering out documents wholesale.\n",
"\n",
"To use the Contextual Compression Retriever, you'll need:\n",
"\n",
"- a base retriever\n",
"- a base [retriever](/docs/concepts/retrievers/)\n",
"- a Document Compressor\n",
"\n",
"The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the contents of documents or dropping documents altogether.\n",
Expand Down
6 changes: 3 additions & 3 deletions docs/docs/how_to/custom_chat_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@
"\n",
":::\n",
"\n",
"In this guide, we'll learn how to create a custom chat model using LangChain abstractions.\n",
"In this guide, we'll learn how to create a custom [chat model](/docs/concepts/chat_models/) using LangChain abstractions.\n",
"\n",
"Wrapping your LLM with the standard [`BaseChatModel`](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html) interface allow you to use your LLM in existing LangChain programs with minimal code modifications!\n",
"\n",
"As an bonus, your LLM will automatically become a LangChain `Runnable` and will benefit from some optimizations out of the box (e.g., batch via a threadpool), async support, the `astream_events` API, etc.\n",
"As an bonus, your LLM will automatically become a LangChain [Runnable](/docs/concepts/runnables/) and will benefit from some optimizations out of the box (e.g., batch via a threadpool), async support, the `astream_events` API, etc.\n",
"\n",
"## Inputs and outputs\n",
"\n",
"First, we need to talk about **messages**, which are the inputs and outputs of chat models.\n",
"First, we need to talk about **[messages](/docs/concepts/messages/)**, which are the inputs and outputs of chat models.\n",
"\n",
"### Messages\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/how_to/custom_retriever.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@
"\n",
"## Overview\n",
"\n",
"Many LLM applications involve retrieving information from external data sources using a `Retriever`. \n",
"Many LLM applications involve retrieving information from external data sources using a [Retriever](/docs/concepts/retrievers/). \n",
"\n",
"A retriever is responsible for retrieving a list of relevant `Documents` to a given user `query`.\n",
"A retriever is responsible for retrieving a list of relevant [Documents](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) to a given user `query`.\n",
"\n",
"The retrieved documents are often formatted into prompts that are fed into an LLM, allowing the LLM to use the information in the to generate an appropriate response (e.g., answering a user question based on a knowledge base).\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/custom_tools.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# How to create tools\n",
"\n",
"When constructing an agent, you will need to provide it with a list of `Tool`s that it can use. Besides the actual function that is called, the Tool consists of several components:\n",
"When constructing an [agent](/docs/concepts/agents/), you will need to provide it with a list of [Tools](/docs/concepts/tools/) that it can use. Besides the actual function that is called, the Tool consists of several components:\n",
"\n",
"| Attribute | Type | Description |\n",
"|---------------|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/document_loader_custom.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"`Document` objects are often formatted into prompts that are fed into an LLM, allowing the LLM to use the information in the `Document` to generate a desired response (e.g., summarizing the document).\n",
"`Documents` can be either used immediately or indexed into a vectorstore for future retrieval and use.\n",
"\n",
"The main abstractions for Document Loading are:\n",
"The main abstractions for [Document Loading](/docs/concepts/document_loaders/) are:\n",
"\n",
"\n",
"| Component | Description |\n",
Expand Down
Loading

0 comments on commit 74438f3

Please sign in to comment.