This repository contains a template for creating custom components for your deepset Cloud pipelines. Components are Python code snippets that perform specific tasks within your pipeline. This template will guide you through all the necessary elements your custom component must include. This template contains two sample components which are ready to be used:
CharacterSplitter
implemented in/src/dc_custom_component/example_components/preprocessors/character_splitter.py
: A component that splits documents into smaller chunks by the number of characters you set. You can use it in indexing pipelines.KeywordBooster
implemented in/src/dc_custom_component/example_components/rankers/keyword_booster.py
: A component that boosts the score of documents that contain specific keywords. You can use it in query pipelines.
We've created these examples to help you understand how to structure your components. When importing your custom components to deepset Cloud, you can remove or rename the example_components
folder with the sample components, if you're not planning to use them.
This template serves as a custom components library for your organization. Only the components present in the most recently uploaded template are available for use in your pipelines.
For more information about custom components, see Custom Components. For a step-by-step guide on creating custom components, see Create a Custom Component. See also our tutorial for creating a custom RegexBooster component.
- Python v3.10 or v3.11
hatch
package manager
We use hatch
to manage our Python packages. Install it with pip:
Linux and macOS:
pip install hatch
Windows: Follow the instructions under https://hatch.pypa.io/1.12/install/#windows
Once installed, create a virtual environment by running:
hatch shell
This installs all the necessary packages needed to create a custom component. You can reference this virtual environment in your IDE.
For more information on hatch
, please refer to the official Hatch documentation.
File | Description |
---|---|
/src/dc_custom_component/components |
Directory for implementing custom components. You can logically group custom components in sub-directories. See how sample components are grouped by type. |
/src/dc_custom_component/__about__.py |
Your custom components' version. deepset Cloud always uses the latest version. Bump the version every time you update your component before uploading it to deepset Cloud. |
/pyproject.toml |
Information about the project. If needed, add your components' dependencies in this file in the dependencies section. |
Note that the location of your custom component implementation defines your component's type
to be used in pipeline YAML. For example, the sample components have the following types because of their location:
dc_custom_component.example_components.preprocessors.character_splitter.CharacterSplitter
dc_custom_component.example_components.rankers.keyword_booster.KeyWordBooster
Here is how you would add them to a pipeline:
components:
splitter:
type: dc_custom_component.example_components.preprocessors.character_splitter.CharacterSplitter
init_parameters: {}
...
We defined a suite of formatting tools. To format your code, run:
hatch run code-quality:all
It's crucial to thoroughly test your custom component before uploading it to deepset Cloud. Consider adding unit and integration tests to ensure your component functions correctly within a pipeline.
pytest
is ready to be used withhatch
- implement your tests under
/test
- run
hatch run tests
- Fork this repository.
- Navigate to the
/src/dc_custom_component/components/
folder. - Add your custom components following the examples.
- Update the components' version in
/src/__about__.py
. - Format your code using the
hatch run code-quality:all
command. (Note that hatch commands work from the project root directory only.) - Set your deepset Cloud API key.
- On Linux and macOS:
export API_KEY=<TOKEN>
- On Windows:
set API_KEY=<TOKEN>
- On Linux and macOS:
- Upload your project by running the following command from inside of this project:
- On Linux and macOS:
hatch run dc:build-and-push
- On Windows:
hatch run dc:build-windows
andhatch run dc:push-windows
This creates a zip file calledcustom_component.zip
in thedist
directory and uploads it to deepset Cloud.
- On Linux and macOS:
For detailed instructions, refer to our documentation on Creating a Custom Component.
We use GitHub Actions to build and push custom components to deepset Cloud. Create a tag to trigger the build and push job. After forking or cloning this repository:
- Add the
DEEPSET_CLOUD_API_KEY
secret to your repository. This is your deepset Cloud API key. (To add a secret, go to your repository and choose Settings > Secrets and variables > Actions > New repository secret.) - (Optional) Adjust the workflow file in
.github/workflows/publish_on_tag.yaml
as needed. - Create a tag to trigger the GitHub Actions workflow. The workflow builds and pushes the custom component to deepset Cloud with the tag as version.
Warning: When using this GitHub Actions workflow, the version specified in the
__about__
file will be overwritten by the tag value. Make sure your tag matches the desired version number.