Merge branch 'main' into willtai/sphinx-documentation-pipeline

neo4j · Apr 29, 2024 · 4bcd425 · 4bcd425
2 parents ca14349 + 6a87979
commit 4bcd425
Show file tree

Hide file tree

Showing 20 changed files with 1,192 additions and 67 deletions.
diff --git a/.github/workflows/cla-check.yaml b/.github/workflows/cla-check.yaml
@@ -0,0 +1,32 @@
+name: "CLA Check"
+
+on:
+  pull_request_target:
+    branches:
+      - main
+
+jobs:
+  cla-check:
+    if: github.event.pull_request.user.login != 'renovate[bot]'
+
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4
+        with:
+          repository: neo-technology/whitelist-check
+          token: ${{ secrets.CLA_CHECK_TOKEN }}
+      - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
+        with:
+          python-version: 3
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+      - run: |
+          owner=$(echo "$GITHUB_REPOSITORY" | cut -d/ -f1)
+          repository=$(echo "$GITHUB_REPOSITORY" | cut -d/ -f2)
+
+          ./bin/examine-pull-request "$owner" "$repository" "${{ secrets.CLA_CHECK_TOKEN }}" "$PULL_REQUEST_NUMBER" cla-database.csv
+        env:
+          PULL_REQUEST_NUMBER: ${{ github.event.number }}
diff --git a/.snyk b/.snyk
@@ -0,0 +1,5 @@
+# Snyk (https://snyk.io) policy file
+
+exclude:
+ code:
+  - tests/**
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,69 @@
+# Contributing to the Neo4j Ecosystem
+
+At [Neo4j](https://neo4j.com/), we develop our software in the open at GitHub.
+This provides transparency for you, our users, and allows you to fork the software to make your own additions and enhancements.
+We also provide areas specifically for community contributions, in particular the [neo4j-contrib](https://github.com/neo4j-contrib) space.
+
+There's an active [Neo4j Online Community](https://community.neo4j.com/) where we work directly with the community.
+If you're not already a member, sign up!
+
+We love our community and wouldn't be where we are without you.
+
+
+## Need to raise an issue?
+
+Where you raise an issue depends largely on the nature of the problem.
+
+Firstly, if you are an Enterprise customer, you might want to head over to our [Customer Support Portal](https://support.neo4j.com/).
+
+There are plenty of public channels available too, though.
+If you simply want to get started or have a question on how to use a particular feature, ask a question in [Neo4j Online Community](https://community.neo4j.com/).
+If you think you might have hit a bug in our software (it happens occasionally!) or you have specific feature request then use the issue feature on the relevant GitHub repository.
+Check first though as someone else may have already raised something similar.
+
+[StackOverflow](https://stackoverflow.com/questions/tagged/neo4j) also hosts a ton of questions and might already have a discussion around your problem.
+Make sure you have a look there too.
+
+Include as much information as you can in any request you make:
+
+- Which versions of our products are you using?
+- Which language (and which version of that language) are you developing with?
+- What operating system are you on?
+- Are you working with a cluster or on a single machine?
+- What code are you running?
+- What errors are you seeing?
+- What solutions have you tried already?
+
+
+## Want to contribute?
+
+If you want to contribute a pull request, we have a little bit of process you'll need to follow:
+
+- Do all your work in a personal fork of the original repository
+- [Rebase](https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request), don't merge (we prefer to keep our history clean)
+- Create a branch (with a useful name) for your contribution
+- Make sure you're familiar with the appropriate coding style (this varies by language so ask if you're in doubt)
+- Include unit tests if appropriate (obviously not necessary for documentation changes)
+- Take a moment to read and sign our [CLA](https://neo4j.com/developer/cla)
+
+We can't guarantee that we'll accept pull requests and may ask you to make some changes before they go in.
+Occasionally, we might also have logistical, commercial, or legal reasons why we can't accept your work but we'll try to find an alternative way for you to contribute in that case.
+Remember that many community members have become regular contributors and some are now even Neo employees!
+
+
+## Specifically for this project
+Setting up the development environment:
+
+1. Install Python 3.9.1+
+2. Install poetry (see https://python-poetry.org/docs/#installation)
+3. Install dependencies:
+
+```shell
+poetry install
+```
+
+4. Install the pre-commit hook, that will do some code-format-checking everytime you commit.
+
+```shell
+pre-commit install
+```
diff --git a/LICENSE.PYTHON.txt b/LICENSE.PYTHON.txt
@@ -20,7 +20,7 @@ analyze, test, perform and/or display publicly, prepare derivative works,
 distribute, and otherwise use Python alone or in any derivative version,
 provided, however, that PSF's License Agreement and PSF's notice of copyright,
 i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
-2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022 Python Software Foundation;
+2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024 Python Software Foundation;
 All Rights Reserved" are retained in Python alone or in any derivative version
 prepared by Licensee.
 

diff --git a/README.md b/README.md
@@ -1,3 +1,170 @@
 # Neo4j GenAI package for Python
 
 This repository contains the official Neo4j GenAI features for Python.
+
+The purpose of this package is to provide a first party package to developers,
+where Neo4j can guarantee long term commitment and maintenance as well as being
+fast to ship new features and high performing patterns and methods.
+
+Docs are coming soon!
+
+# Usage
+
+## Installation
+
+This package requires Python (>=3.8.1).
+
+To install the latest stable version, use:
+
+```shell
+pip install neo4j-genai
+```
+
+## Examples
+
+While the library has more retrievers than shown here, the following examples should be able to get you started.
+
+### Performing a similarity search
+
+Assumption: Neo4j running with populated vector index in place.
+
+```python
+from neo4j import GraphDatabase
+from neo4j_genai import VectorRetriever
+
+URI = "neo4j://localhost:7687"
+AUTH = ("neo4j", "password")
+
+INDEX_NAME = "embedding-name"
+
+# Connect to Neo4j database
+driver = GraphDatabase.driver(URI, auth=AUTH)
+
+# Initialize the retriever
+retriever = VectorRetriever(driver, INDEX_NAME)
+
+# Run the similarity search
+query_text = "How do I do similarity search in Neo4j?"
+response = retriever.search(query_text=query_text, top_k=5)
+```
+
+### Creating a vector index
+
+When creating a vector index, make sure you match the number of dimensions in the index with the number of dimensions the embeddings have.
+
+Assumption: Neo4j running
+
+```python
+from neo4j import GraphDatabase
+from neo4j_genai.indexes import create_vector_index
+
+URI = "neo4j://localhost:7687"
+AUTH = ("neo4j", "password")
+
+INDEX_NAME = "chunk-index"
+
+# Connect to Neo4j database
+driver = GraphDatabase.driver(URI, auth=AUTH)
+
+# Creating the index
+create_vector_index(
+    driver,
+    INDEX_NAME,
+    label="Document",
+    property="textProperty",
+    dimensions=1536,
+    similarity_fn="euclidean",
+)
+
+```
+
+### Populating the Neo4j Vector Index
+
+This library does not write to the database, that is up to you.  
+See below for how to write using Cypher via the Neo4j driver.
+
+Assumption: Neo4j running with a defined vector index
+
+```python
+from neo4j import GraphDatabase
+from random import random
+
+URI = "neo4j://localhost:7687"
+AUTH = ("neo4j", "password")
+
+# Connect to Neo4j database
+driver = GraphDatabase.driver(URI, auth=AUTH)
+
+# Upsert the vector
+vector = [random() for _ in range(DIMENSION)]
+insert_query = (
+    "MERGE (n:Document {id: $id})"
+    "WITH n "
+    "CALL db.create.setNodeVectorProperty(n, 'textProperty', $vector)"
+    "RETURN n"
+)
+parameters = {
+    "id": 0,
+    "vector": vector,
+}
+driver.execute_query(insert_query, parameters)
+```
+
+# Development
+
+## Install dependencies
+
+```bash
+poetry install
+```
+
+## Getting started
+
+### Issues
+
+If you have a bug to report or feature to request, first
+[search to see if an issue already exists](https://docs.github.com/en/github/searching-for-information-on-github/searching-on-github/searching-issues-and-pull-requests#search-by-the-title-body-or-comments).
+If a related issue doesn't exist, please raise a new issue using the relevant
+[issue form](https://github.com/neo4j/neo4j-genai-python/issues/new/choose).
+
+If you're a Neo4j Enterprise customer, you can also reach out to [Customer Support](http://support.neo4j.com/).
+
+If you don't have a bug to report or feature request, but you need a hand with
+the library; community support is available via [Neo4j Online Community](https://community.neo4j.com/)
+and/or [Discord](https://discord.gg/neo4j).
+
+### Make changes
+
+1. Fork the repository.
+2. Install Python and Poetry. For more information, see [the development guide](./docs/contributing/DEVELOPING.md).
+3. Create a working branch from `main` and start with your changes!
+
+### Pull request
+
+When you're finished with your changes, create a pull request, also known as a PR.
+
+-   Ensure that you have [signed the CLA](https://neo4j.com/developer/contributing-code/#sign-cla).
+-   Ensure that the base of your PR is set to `main`.
+-   Don't forget to [link your PR to an issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)
+    if you are solving one.
+-   Enable the checkbox to [allow maintainer edits](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
+    so that maintainers can make any necessary tweaks and update your branch for merge.
+-   Reviewers may ask for changes to be made before a PR can be merged, either using
+    [suggested changes](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request)
+    or normal pull request comments. You can apply suggested changes directly through
+    the UI, and any other changes can be made in your fork and committed to the PR branch.
+-   As you update your PR and apply changes, mark each conversation as [resolved](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/commenting-on-a-pull-request#resolving-conversations).
+
+## Run tests
+
+Open a new virtual environment and then run the tests.
+
+```bash
+poetry shell
+pytest
+```
+
+## Further information
+
+-   [The official Neo4j Python driver](https://github.com/neo4j/neo4j-python-driver)
+-   [Neo4j GenAI integrations](https://neo4j.com/docs/cypher-manual/current/genai-integrations/)
diff --git a/examples/hybrid_search.py b/examples/hybrid_search.py
@@ -0,0 +1,59 @@
+from neo4j import GraphDatabase
+
+from random import random
+from neo4j_genai.embedder import Embedder
+from neo4j_genai.indexes import create_vector_index, create_fulltext_index
+from neo4j_genai.retrievers import HybridRetriever
+
+URI = "neo4j://localhost:7687"
+AUTH = ("neo4j", "password")
+
+INDEX_NAME = "embedding-name"
+FULLTEXT_INDEX_NAME = "fulltext-index-name"
+DIMENSION = 1536
+
+# Connect to Neo4j database
+driver = GraphDatabase.driver(URI, auth=AUTH)
+
+
+# Create Embedder object
+class CustomEmbedder(Embedder):
+    def embed_query(self, text: str) -> list[float]:
+        return [random() for _ in range(DIMENSION)]
+
+
+embedder = CustomEmbedder()
+
+# Creating the index
+create_vector_index(
+    driver,
+    INDEX_NAME,
+    label="Document",
+    property="propertyKey",
+    dimensions=DIMENSION,
+    similarity_fn="euclidean",
+)
+create_fulltext_index(
+    driver, FULLTEXT_INDEX_NAME, label="Document", node_properties=["propertyKey"]
+)
+
+# Initialize the retriever
+retriever = HybridRetriever(driver, INDEX_NAME, FULLTEXT_INDEX_NAME, embedder)
+
+# Upsert the query
+vector = [random() for _ in range(DIMENSION)]
+insert_query = (
+    "MERGE (n:Document {id: $id})"
+    "WITH n "
+    "CALL db.create.setNodeVectorProperty(n, 'propertyKey', $vector)"
+    "RETURN n"
+)
+parameters = {
+    "id": 0,
+    "vector": vector,
+}
+driver.execute_query(insert_query, parameters)
+
+# Perform the similarity search for a text query
+query_text = "Who are the fremen?"
+print(retriever.search(query_text=query_text, top_k=5))
diff --git a/examples/openai_search.py b/examples/openai_search.py
@@ -34,13 +34,15 @@
 
 # Upsert the query
 vector = [random() for _ in range(DIMENSION)]
+
 insert_query = (
-    "MERGE (n:Document)"
+    "MERGE (n:Document {id: $id})"
     "WITH n "
     "CALL db.create.setNodeVectorProperty(n, 'propertyKey', $vector)"
     "RETURN n"
 )
 parameters = {
+    "id": 0,
     "vector": vector,
 }
 driver.execute_query(insert_query, parameters)

diff --git a/examples/similarity_search_for_text.py b/examples/similarity_search_for_text.py
@@ -1,4 +1,3 @@
-from typing import List
 from neo4j import GraphDatabase
 from neo4j_genai import VectorRetriever
 
@@ -16,9 +15,9 @@
 driver = GraphDatabase.driver(URI, auth=AUTH)
 
 
-# Create Embedder object
+# Create CustomEmbedder object with the required Embedder type
 class CustomEmbedder(Embedder):
-    def embed_query(self, text: str) -> List[float]:
+    def embed_query(self, text: str) -> list[float]:
         return [random() for _ in range(DIMENSION)]
 
 
@@ -40,12 +39,13 @@ def embed_query(self, text: str) -> List[float]:
 # Upsert the query
 vector = [random() for _ in range(DIMENSION)]
 insert_query = (
-    "MERGE (n:Document)"
+    "MERGE (n:Document {id: $id})"
     "WITH n "
     "CALL db.create.setNodeVectorProperty(n, 'propertyKey', $vector)"
     "RETURN n"
 )
 parameters = {
+    "id": 0,
     "vector": vector,
 }
 driver.execute_query(insert_query, parameters)