Skip to content

Commit

Permalink
Merge branch 'master' into cb-return-id-in-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
efriis authored Oct 24, 2024
2 parents 3475603 + 455ab7d commit 97b9811
Show file tree
Hide file tree
Showing 21 changed files with 1,904 additions and 10 deletions.
444 changes: 444 additions & 0 deletions docs/docs/integrations/chat/naver.ipynb

Large diffs are not rendered by default.

38 changes: 38 additions & 0 deletions docs/docs/integrations/providers/naver.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# NAVER

All functionality related to `Naver`, including HyperCLOVA X models, particularly those accessible through `Naver Cloud` [CLOVA Studio](https://clovastudio.ncloud.com/).

> [Naver](https://navercorp.com/) is a global technology company with cutting-edge technologies and a diverse business portfolio including search, commerce, fintech, content, cloud, and AI.
> [Naver Cloud](https://www.navercloudcorp.com/lang/en/) is the cloud computing arm of Naver, a leading cloud service provider offering a comprehensive suite of cloud services to businesses through its [Naver Cloud Platform (NCP)](https://www.ncloud.com/).
Please refer to [NCP User Guide](https://guide.ncloud-docs.com/docs/clovastudio-overview) for more detailed instructions (also in Korean).

## Installation and Setup

- Get both CLOVA Studio API Key and API Gateway Key by [creating your app](https://guide.ncloud-docs.com/docs/en/clovastudio-playground01#create-test-app) and set them as environment variables respectively (`NCP_CLOVASTUDIO_API_KEY`, `NCP_APIGW_API_KEY`).
- Install the integration Python package with:

```bash
pip install -U langchain-community
```

## Chat models

### ChatClovaX

See a [usage example](/docs/integrations/chat/naver).

```python
from langchain_community.chat_models import ChatClovaX
```

## Embedding models

### ClovaXEmbeddings

See a [usage example](/docs/integrations/text_embedding/naver).

```python
from langchain_community.embeddings import ClovaXEmbeddings
```
318 changes: 318 additions & 0 deletions docs/docs/integrations/text_embedding/naver.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,318 @@
{
"cells": [
{
"cell_type": "raw",
"id": "afaf8039",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Naver\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "e49f1e0d",
"metadata": {},
"source": [
"# ClovaXEmbeddings\n",
"\n",
"This notebook covers how to get started with embedding models provided by CLOVA Studio. For detailed documentation on `ClovaXEmbeddings` features and configuration options, please refer to the [API reference](https://python.langchain.com/api_reference/community/embeddings/langchain_community.naver.ClovaXEmbeddings.html).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Provider | Package |\n",
"|:--------:|:-------:|\n",
"| [Naver](/docs/integrations/providers/naver.mdx) | [langchain-community](https://python.langchain.com/api_reference/community/embeddings/langchain_community.naver.ClovaXEmbeddings.html) |\n",
"\n",
"## Setup\n",
"\n",
"Before using embedding models provided by CLOVA Studio, you must go through the three steps below.\n",
"\n",
"1. Creating [NAVER Cloud Platform](https://www.ncloud.com/) account \n",
"2. Apply to use [CLOVA Studio](https://www.ncloud.com/product/aiService/clovaStudio)\n",
"3. Find API Keys after creating CLOVA Studio Test App or Service App (See [here](https://guide.ncloud-docs.com/docs/en/clovastudio-playground01#테스트앱생성).)\n",
"\n",
"### Credentials\n",
"\n",
"CLOVA Studio requires 3 keys (`NCP_CLOVASTUDIO_API_KEY`, `NCP_APIGW_API_KEY` and `NCP_CLOVASTUDIO_APP_ID`) for embeddings.\n",
"- `NCP_CLOVASTUDIO_API_KEY` and `NCP_CLOVASTUDIO_APP_ID` is issued per serviceApp or testApp\n",
"- `NCP_APIGW_API_KEY` is issued per account\n",
"\n",
"The two API Keys could be found by clicking `App Request Status` > `Service App, Test App List` > `‘Details’ button for each app` in [CLOVA Studio](https://clovastudio.ncloud.com/studio-application/service-app)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c52e8a50-3e67-4272-bc80-3954d98f8dea",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"NCP_CLOVASTUDIO_API_KEY\"):\n",
" os.environ[\"NCP_CLOVASTUDIO_API_KEY\"] = getpass.getpass(\n",
" \"Enter NCP CLOVA Studio API Key: \"\n",
" )\n",
"if not os.getenv(\"NCP_APIGW_API_KEY\"):\n",
" os.environ[\"NCP_APIGW_API_KEY\"] = getpass.getpass(\"Enter NCP API Gateway API Key: \")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83520d8e-ecf8-4e47-b3bc-1ac205b3a2ab",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"NCP_CLOVASTUDIO_APP_ID\"] = input(\"Enter NCP CLOVA Studio App ID: \")"
]
},
{
"cell_type": "markdown",
"id": "ff00653e",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"ClovaXEmbeddings integration lives in the `langchain_community` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "99400c9b",
"metadata": {},
"outputs": [],
"source": [
"# install package\n",
"!pip install -U langchain-community"
]
},
{
"cell_type": "markdown",
"id": "2651e611-9d5b-4315-9bbd-f99f56be4e19",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our embeddings object and embed query or document:\n",
"\n",
"- There are several embedding models available in CLOVA Studio. Please refer [here](https://guide.ncloud-docs.com/docs/en/clovastudio-explorer03#임베딩API) for further details.\n",
"- Note that you might need to normalize the embeddings depending on your specific use case."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "62e0dbc3",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"from langchain_community.embeddings import ClovaXEmbeddings\n",
"\n",
"embeddings = ClovaXEmbeddings(\n",
" model=\"clir-emb-dolphin\", # set with the model name of corresponding app id. Default is `clir-emb-dolphin`\n",
" # app_id=\"...\" # set if you prefer to pass app id directly instead of using environment variables\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0493b4a8",
"metadata": {},
"source": [
"## Indexing and Retrieval\n",
"\n",
"Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n",
"\n",
"Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "d4d59653",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'CLOVA Studio is an AI development tool that allows you to customize your own HyperCLOVA X models.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Create a vector store with a sample text\n",
"from langchain_core.vectorstores import InMemoryVectorStore\n",
"\n",
"text = \"CLOVA Studio is an AI development tool that allows you to customize your own HyperCLOVA X models.\"\n",
"\n",
"vectorstore = InMemoryVectorStore.from_texts(\n",
" [text],\n",
" embedding=embeddings,\n",
")\n",
"\n",
"# Use the vectorstore as a retriever\n",
"retriever = vectorstore.as_retriever()\n",
"\n",
"# Retrieve the most similar text\n",
"retrieved_documents = retriever.invoke(\"What is CLOVA Studio?\")\n",
"\n",
"# show the retrieved document's content\n",
"retrieved_documents[0].page_content"
]
},
{
"cell_type": "markdown",
"id": "b1a249e1",
"metadata": {},
"source": [
"## Direct Usage\n",
"\n",
"Under the hood, the vectorstore and retriever implementations are calling `embeddings.embed_documents(...)` and `embeddings.embed_query(...)` to create embeddings for the text(s) used in `from_texts` and retrieval `invoke` operations, respectively.\n",
"\n",
"You can directly call these methods to get embeddings for your own use cases.\n",
"\n",
"### Embed single texts\n",
"\n",
"You can embed single texts or documents with `embed_query`:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "12fcfb4b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.094717406, -0.4077411, -0.5513184, 1.6024436, -1.3235079, -1.0720996, -0.44471845, 1.3665184, 0.\n"
]
}
],
"source": [
"single_vector = embeddings.embed_query(text)\n",
"print(str(single_vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "markdown",
"id": "8b383b53",
"metadata": {},
"source": [
"### Embed multiple texts\n",
"\n",
"You can embed multiple texts with `embed_documents`:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "1f2e6104",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.094717406, -0.4077411, -0.5513184, 1.6024436, -1.3235079, -1.0720996, -0.44471845, 1.3665184, 0.\n",
"[-0.25525448, -0.84877056, -0.6928286, 1.5867524, -1.2930486, -0.8166254, -0.17934391, 1.4236152, 0.\n"
]
}
],
"source": [
"text2 = \"LangChain is the framework for building context-aware reasoning applications\"\n",
"two_vectors = embeddings.embed_documents([text, text2])\n",
"for vector in two_vectors:\n",
" print(str(vector)[:100]) # Show the first 100 characters of the vector"
]
},
{
"cell_type": "markdown",
"id": "eee40d32367cc5c4",
"metadata": {},
"source": [
"## Additional functionalities\n",
"\n",
"### Service App\n",
"\n",
"When going live with production-level application using CLOVA Studio, you should apply for and use Service App. (See [here](https://guide.ncloud-docs.com/docs/en/clovastudio-playground01#서비스앱신청).)\n",
"\n",
"For a Service App, corresponding `NCP_CLOVASTUDIO_API_KEY` and `NCP_CLOVASTUDIO_APP_ID` are issued and can only be called with them."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08f9f44e-c6a4-4163-8caf-27a0cda345b7",
"metadata": {},
"outputs": [],
"source": [
"# Update environment variables\n",
"\n",
"os.environ[\"NCP_CLOVASTUDIO_API_KEY\"] = getpass.getpass(\n",
" \"Enter NCP CLOVA Studio API Key for Service App: \"\n",
")\n",
"os.environ[\"NCP_CLOVASTUDIO_APP_ID\"] = input(\"Enter NCP CLOVA Studio Service App ID: \")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86f59698-b3f4-4b19-a9d4-4facfcea304b",
"metadata": {},
"outputs": [],
"source": [
"embeddings = ClovaXEmbeddings(\n",
" service_app=True,\n",
" model=\"clir-emb-dolphin\", # set with the model name of corresponding app id of your Service App\n",
" # app_id=\"...\" # set if you prefer to pass app id directly instead of using environment variables\n",
")"
]
},
{
"cell_type": "markdown",
"id": "1ddeaee9",
"metadata": {},
"source": [
"## API Reference\n",
"\n",
"For detailed documentation on `ClovaXEmbeddings` features and configuration options, please refer to the [API reference](https://python.langchain.com/latest/api_reference/community/embeddings/langchain_community.embeddings.naver.ClovaXEmbeddings.html)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 97b9811

Please sign in to comment.