-
Notifications
You must be signed in to change notification settings - Fork 81
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Add documentation for Milvus/Zilliz database integration (#1203)
This integration is similar to Pinecone/Qdrant. I'm writing a tutorial on how to use Apify with Milvus/Zilliz, and it would be good to reference our documentation in the tutorial. Just a note: Milvus is an open-source vector database, and Zilliz offers a managed solution based on Milvus. @TC-MO, could you please review the English? I haven't included any screenshots, as I believe the description should suffice to get started. Also, maintain and update screenshot is a bit painful. Additionally, I haven't added this integration to the [cards](https://docs.apify.com/platform/integrations#data-pipelines-etls-and-aillm-tools). Do you think it should be included there?
- Loading branch information
1 parent
d404b12
commit 7c6fc28
Showing
3 changed files
with
141 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
--- | ||
title: Milvus integration | ||
description: Learn how to integrate Apify with Milvus (Zilliz) to save data scraped from the websites into the Milvus vector database. | ||
sidebar_label: Milvus | ||
sidebar_position: 4 | ||
slug: /integrations/milvus | ||
toc_min_heading_level: 2 | ||
toc_max_heading_level: 4 | ||
--- | ||
|
||
**Learn how to integrate Apify with Milvus (Zilliz) to save data scraped from websites into the Milvus vector database.** | ||
|
||
--- | ||
|
||
[Milvus](https://milvus.io/) is an open-source vector database optimized for performing similarity searches on large datasets of high-dimensional vectors. | ||
Its focus on efficient vector similarity search allows for the creation of powerful and scalable retrieval systems. | ||
|
||
The Apify integration for Milvus allows exporting results from Apify Actors and Dataset items into a Milvus collection. | ||
It can also be connected to a managed Milvus instance on [Zilliz Cloud](https://cloud.zilliz.com). | ||
|
||
## Prerequisites | ||
|
||
Before you begin, ensure that you have the following: | ||
|
||
- A Milvus database URL and API token. Optionally, you can use a username and password. You can run Milvus on Docker or Kubernetes, but in this example, we'll use the hosted Milvus service at [Zilliz Cloud](https://cloud.zilliz.com). | ||
- An [OpenAI API key](https://openai.com/index/openai-api/) to compute text embeddings. | ||
- An [Apify API token](https://docs.apify.com/platform/integrations/api#api-token) to access [Apify Actors](https://apify.com/store). | ||
|
||
### How to set up Milvus database | ||
|
||
1. Sign up or log in to your Zilliz account and create a new cluster. | ||
|
||
1. Download the created credentials: user name and password. | ||
|
||
Once the cluster is ready and you have the URL, API key, and credentials, you can set up the integration with Apify. | ||
|
||
|
||
### Integration Methods | ||
|
||
You can integrate Apify with Milvus using either the Apify Console or the Apify Python SDK. | ||
|
||
:::note Website Content Crawler usage | ||
|
||
These examples use the Website Content Crawler Actor, which performs deep website crawling, cleans HTML by removing modals and navigation elements, and converts the content into Markdown. | ||
|
||
::: | ||
|
||
#### Apify Console | ||
|
||
1. Set up the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor in the [Apify Console](https://console.apify.com). Refer to this guide on how to set up [website content crawl for your project](https://blog.apify.com/talk-to-your-website-with-large-language-models/). | ||
|
||
1. After setting up the crawler, go to the **integration** section, select **Connect Actor or Task**, and search for the Milvus integration. | ||
|
||
1. Select when to trigger this integration (typically when a run succeeds) and fill in all the required fields. If you haven't created a collection, it will be created automatically. You can learn more about the input parameters at the [Milvus integration input schema](https://apify.com/apify/milvus-integration/input-schema). | ||
|
||
- For a detailed explanation of the input parameters, including dataset settings, incremental updates, and examples, see the [Milvus integration description](https://apify.com/apify/milvus-integration). | ||
|
||
- For an explanation on how to combine Actors to accomplish more complex tasks, refer to the guide on [Actor-to-Actor](https://blog.apify.com/connecting-scrapers-apify-integration/) integrations. | ||
|
||
#### Python | ||
|
||
Another way to interact with Milvus is through the [Apify Python SDK](https://docs.apify.com/sdk/python/). | ||
|
||
1. Install the Apify Python SDK by running the following command: | ||
|
||
```py | ||
pip install apify-client | ||
``` | ||
|
||
1. Create a Python script and import all the necessary modules: | ||
|
||
```python | ||
from apify_client import ApifyClient | ||
|
||
APIFY_API_TOKEN = "YOUR-APIFY-TOKEN" | ||
OPENAI_API_KEY = "YOUR-OPENAI-API-KEY" | ||
|
||
MILVUS_COLLECTION_NAME = "YOUR-MILVUS-COLLECTION-NAME" | ||
MILVUS_URL = "YOUR-MILVUS-URL" | ||
MILVUS_API_KEY = "YOUR-MILVUS-API-KEY" | ||
MILVUS_USER = "YOUR-MILVUS-USER" | ||
MILVUS_PASSWORD = "YOUR-MILVUS-PASSWORD" | ||
|
||
client = ApifyClient(APIFY_API_TOKEN) | ||
``` | ||
|
||
1. Call the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor to crawl the Milvus documentation and Zilliz website and extract text content from the web pages: | ||
|
||
```python | ||
actor_call = client.actor("apify/website-content-crawler").call( | ||
run_input={"maxCrawlPages": 10, "startUrls": [{"url": "https://milvus.io/"}, {"url": "https://zilliz.com/"}]} | ||
) | ||
``` | ||
|
||
1. Call Apify's Milvus integration and store all data in the Milvus Vector Database: | ||
|
||
```python | ||
milvus_integration_inputs = { | ||
"milvusUrl": MILVUS_URL, | ||
"milvusApiKey": MILVUS_API_KEY, | ||
"milvusCollectionName": MILVUS_COLLECTION_NAME, | ||
"milvusUser": MILVUS_USER, | ||
"milvusPassword": MILVUS_PASSWORD, | ||
"datasetFields": ["text"], | ||
"datasetId": actor_call["defaultDatasetId"], | ||
"deltaUpdatesPrimaryDatasetFields": ["url"], | ||
"expiredObjectDeletionPeriodDays": 30, | ||
"embeddingsApiKey": OPENAI_API_KEY, | ||
"embeddingsProvider": "OpenAI", | ||
} | ||
actor_call = client.actor("apify/milvus-integration").call(run_input=milvus_integration_inputs) | ||
|
||
``` | ||
|
||
Congratulations! You've successfully integrated Apify with Milvus, and the scraped data is now stored in your Milvus database. | ||
|
||
## Additional Resources | ||
|
||
- [Apify Milvus Integration](https://apify.com/apify/milvus-integration) | ||
- [Milvus documentation](https://milvus.io/docs) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.